0% found this document useful (0 votes)
28 views234 pages

Probability & Statistical Methods - Unit 1 To 4 Material

The document discusses concepts related to conditional probability, independence, and Bayes' Theorem, providing definitions, examples, and applications. It explains the sample space, multiplication rule, and the law of total probability, alongside practical problems involving probability calculations. Additionally, it covers the Monty Hall problem and includes exercises for further understanding of the topics presented.

Uploaded by

vignesh MG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views234 pages

Probability & Statistical Methods - Unit 1 To 4 Material

The document discusses concepts related to conditional probability, independence, and Bayes' Theorem, providing definitions, examples, and applications. It explains the sample space, multiplication rule, and the law of total probability, alongside practical problems involving probability calculations. Additionally, it covers the Monty Hall problem and includes exercises for further understanding of the topics presented.

Uploaded by

vignesh MG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 234

Conditional Probability, Independence, Bayes’ Theorem

18.05 Spring 2014

January 1, 2017 1 / 23
Sample Space Confusions

1. Sample space = set of all possible outcomes of an experiment.


2. The size of the set is NOT the sample space.
3. Outcomes can be sequences of numbers.
Examples.
1. Roll 5 dice: Ω = set of all sequences of 5 numbers between 1 and
6, e.g. (1, 2, 1, 3, 1, 5) ∈ Ω.
The size |Ω| = 65 is not a set.
2. Ω = set of all sequences of 10 birthdays,
e.g. (111, 231, 3, 44, 55, 129, 345, 14, 24, 14) ∈ Ω.
|Ω| = 36510
3. n some number, Ω = set of all sequences of n birthdays.
|Ω| = 365n .

January 1, 2017 2 / 23
Conditional Probability
‘the probability of A given B’.
P(A ∩ B)
P(A|B) = , provided P(B) = 0.
P(B)

B
A
A∩B

Conditional probability: Abstractly and for coin example


January 1, 2017 3 / 23
Table/Concept Question
(Work with your tablemates, then everyone click in the answer.)

Toss a coin 4 times. Let


A = ‘at least three heads’
B = ‘first toss is tails’.

1. What is P(A|B)?
(a) 1/16 (b) 1/8 (c) 1/4 (d) 1/5

2. What is P(B|A)?
(a) 1/16 (b) 1/8 (c) 1/4 (d) 1/5

January 1, 2017 4 / 23
Table Question

“Steve is very shy and withdrawn, invariably


helpful, but with little interest in people, or in the
world of reality. A meek and tidy soul, he has a
need for order and structure and a passion for
detail.”∗
What is the probability that Steve is a librarian?
What is the probability that Steve is a farmer?
∗ From
Judgment under uncertainty: heuristics and biases by Tversky and
Kahneman.

January 1, 2017 5 / 23
Multiplication Rule, Law of Total Probability
Multiplication rule: P(A ∩ B) = P(A|B) · P(B).

Law of total probability: If B1 , B2 , B3 partition Ω then


P(A) = P(A ∩ B1 ) + P(A ∩ B2 ) + P(A ∩ B3 )
= P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) + P(A|B3 )P(B3 )


B1
A ∩ B1

A ∩ B2 A ∩ B3

B2 B3
January 1, 2017 6 / 23
Trees
Organize computations
Compute total probability
Compute Bayes’ formula
Example. : Game: 5 red and 2 green balls in an urn. A random ball
is selected and replaced by a ball of the other color; then a second
ball is drawn.
1. What is the probability the second ball is red?
2. What is the probability the first ball was red given the second ball
was red?
5/7 2/7
First draw R1 G1
4/7 3/7 6/7 1/7
Second draw
R2 G2 R2 G2

January 1, 2017 7 / 23
Concept Question: Trees 1
x
A1 y A2

B1 z B2 B1 B2

C1 C2 C1 C2 C1 C2 C1 C2

1. The probability x represents

(a) P(A1 )
(b) P(A1 |B2 )
(c) P(B2 |A1 )
(d) P(C1 |B2 ∩ A1 ).

January 1, 2017 8 / 23
Concept Question: Trees 2
x
A1 y A2

B1 z B2 B1 B2

C1 C2 C1 C2 C1 C2 C1 C2

2. The probability y represents

(a) P(B2 )
(b) P(A1 |B2 )
(c) P(B2 |A1 )
(d) P(C1 |B2 ∩ A1 ).

January 1, 2017 9 / 23
Concept Question: Trees 3
x
A1 y A2

B1 z B2 B1 B2

C1 C2 C1 C2 C1 C2 C1 C2

3. The probability z represents

(a) P(C1 )
(b) P(B2 |C1 )
(c) P(C1 |B2 )
(d) P(C1 |B2 ∩ A1 ).

January 1, 2017 10 / 23
Concept Question: Trees 4
x
A1 y A2

B1 z B2 B1 B2

C1 C2 C1 C2 C1 C2 C1 C2

4. The circled node represents the event

(a) C1
(b) B2 ∩ C1
(c) A1 ∩ B2 ∩ C1
(d) C1 |B2 ∩ A1 .

January 1, 2017 11 / 23
Let’s Make a Deal with Monty Hall
One door hides a car, two hide goats.
The contestant chooses any door.
Monty always opens a different door with a goat. (He
can do this because he knows where the car is.)
The contestant is then allowed to switch doors if she
wants.
What is the best strategy for winning a car?
(a) Switch (b) Don’t switch (c) It doesn’t matter

January 1, 2017 12 / 23
Board question: Monty Hall

Organize the Monty Hall problem into a tree and compute


the probability of winning if you always switch.
Hint first break the game into a sequence of actions.

January 1, 2017 13 / 23
Independence
Events A and B are independent if the probability that
one occurred is not affected by knowledge that the other
occurred.

Independence ⇔ P(A|B) = P(A) (provided P(B) 6= 0)


⇔ P(B|A) = P(B) (provided P(A) 6= 0)

(For any A and B)

⇔ P(A ∩ B) = P(A)P(B)

January 1, 2017 14 / 23
Table/Concept Question: Independence
(Work with your tablemates, then everyone click in the answer.)

Roll two dice and consider the following events


A = ‘first die is 3’
B = ‘sum is 6’
C = ‘sum is 7’
A is independent of
(a) B and C (b) B alone
(c) C alone (d) Neither B or C .

January 1, 2017 15 / 23
Bayes’ Theorem

Also called Bayes’ Rule and Bayes’ Formula.


Allows you to find P(A|B) from P(B|A), i.e. to ‘invert’
conditional probabilities.

P(B|A) · P(A)
P(A|B) =
P(B)
Often compute the denominator P(B) using the law of
total probability.

January 1, 2017 16 / 23
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics

Theorem 12.11 (Bayes’ Theorem)

If A1 , A2 , A3 , ..., An are mutually exclusive and exhaustive events such that P(Ai) > 0, i = 1,2,3,….n and B is any event in which P(B)
> 0, then

Proof

By the law of total probability of B we have

P ( B) = P ( A1 ) . P ( B / A1 ) + P( A2 ) . P(B / A2 )+...+P ( An ) P ( B / An ) and by multiplication theorem P ( Ai ∩ B) = P ( B /Ai ) P ( Ai


)

By the definition of conditional probability,

The above formula gives the relationship between P( Ai / B) and P (B / Ai )

Example 12.26

A factory has two machines I and II. Machine I produces 40% of items of the output and Machine II produces 60% of the items.
Further 4% of items produced by Machine I are defective and 5% produced by Machine II are defective. An item is drawn at random. If
the drawn item is defective, find the probability that it was produced by Machine II. (See the previous example, compare the
questions).

Solution

Let A1 be the event that the items are produced by Machine-I, A2 be the event that items are produced by Machine-II. Let B be the
event of drawing a defective item. Now we are asked to find the conditional probability P (A2 / B). Since A1 , A2 are mutually exclusive
and exhaustive events, by Bayes’ theorem,

We have,

https://www.brainkart.com/article/Bayes--Theorem_36140/ 2/8
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics

P ( A1 ) =0.40 , P ( B / A1 ) = 0.04

P ( A2 ) = 0.60, P (B / A2 ) = 0.05

Example 12.27

A construction company employs 2 executive engineers. Engineer-1 does the work for 60% of jobs of the company. Engineer-2 does
the work for 40% of jobs of the company. It is known from the past experience that the probability of an error when engineer-1 does the
work is 0.03, whereas the probability of an error in the work of engineer-2 is 0.04. Suppose a serious error occurs in the work, which
engineer would you guess did the work?

Solution

Let A1and A2 be the events of job done by engineer-1 and engineer-2 of the company respectively. Let B be the event that the error
occurs in the work.

We have to find the conditional probability

P (A1 / B ) and P (A2 / B) to compare their errors in their work.

From the given information, we have

P (A1 ) = 0.60, P (B / A1 ) = 0.03

P (A2 ) = 0.40, P (B / A2 ) = 0.04

A1 and A2 are mutually exclusive and exhaustive events.

Applying Bayes’ theorem,

Since P (A1 / B ) > P (A2 / B) , the chance of error done by engineer-1 is greater than the chance of error done by engineer-2. Therefore
one may guess that the serious error would have been be done by engineer-1.

https://www.brainkart.com/article/Bayes--Theorem_36140/ 3/8
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics

Example 12.28

The chances of X, Y and Z becoming managers of a certain company are 4 : 2 : 3. The probabilities that bonus scheme will be
introduced if X, Y and Z become managers are 0.3, 0.5 and 0.4 respectively. If the bonus scheme has been introduced, what is the
probability that Z was appointed as the manager?

Solution

Let A1, A2 and A3 be the events of X, Y and Z becoming managers of the company respectively. Let B be the event that the bonus scheme
will be introduced.

We have to find the conditional probability P ( A3 / B).

Since A1 , A2 and A3 are mutually exclusive and exhaustive events, applying Bayes’ theorem

Example 12.29

A consulting firm rents car from three agencies such that 50% from agency L, 30% from agency M and 20% from agency N. If 90% of
the cars from L, 70% of cars from M and 60% of the cars from N are in good conditions (i) what is the probability that the firm will get
a car in good condition? (ii) if a car is in good condition, what is probability that it has come from agency N?

Solution

Let A1 , A2 , and A3 be the events that the cars are rented from the agencies X, Y and Z respectively.

https://www.brainkart.com/article/Bayes--Theorem_36140/ 4/8
05/11/2022, 19:47 Bayes’ Theorem - Theorem, Proof, Solved Example Problems | Mathematics

Let G be the event of getting a car in good condition.

We have to find

(i) the total probability of event G that is, P(G)

(ii) find the conditional probability A3 given G that is, P ( A3 / G) We have

P ( A1 ) = 0.50, P (G / A1) = 0.90

P ( A2 ) = 0.30, P (G / A2 ) = 0.70

P (A3 ) = 0.20, P (G / A3 ) = 0.60.

(i) Since A1 , A2 and A3 are mutually exclusive and exhaustive events and G is an event in S, then the total probability of event G is
P(G).

P (G) = P (A1 ) P (G / A1 ) + P (A2 ) P (G / A2 ) + P (A3 ) P (G / A3 )

P(G) = ( 0.50)( 0.90) + ( 0.30)( 0.70) + ( 0.20)( 0.60)

P(G) = 0.78.

(ii) The conditional probability A3 given G is P (A3 / G)

By Bayes’ theorem,

EXERCISE 12.4

(1) A factory has two Machines-I and II. Machine-I produces 60% of items and Machine-II produces 40% of the items of the total
output. Further 2% of the items produced by Machine-I are defective whereas 4% produced by Machine-II are defective. If an item is
drawn at random what is the probability that it is defective?

(2) There are two identical urns containing respectively 6 black and 4 red balls, 2 black and 2 red balls. An urn is chosen at random and
a ball is drawn from it. (i) find the probability that the ball is black (ii) if the ball is black, what is the probability that it is from the first
urn?

(3) A firm manufactures PVC pipes in three plants viz, X, Y and Z. The daily production volumes from the three firms X, Y and Z are
respectively 2000 units, 3000 units and 5000 units. It is known from the past experience that 3% of the output from plant X, 4% from
plant Y and 2% from plant Z are defective. A pipe is selected at random from a day’s total production,

https://www.brainkart.com/article/Bayes--Theorem_36140/ 5/8
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA8451- PROBABILITY AND RANDOM PROCESSES Page 1


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 2


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 3


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 4


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 5


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 6


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 7


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 8


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 9


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 10


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 11


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 12


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 13


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 14


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 15


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 16


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 17


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 18


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 19


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 20


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 21


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 22


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 23


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 24


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 25


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 26


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 27


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 28


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 29


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 30


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 31


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 32


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 33


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 34


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 35


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 36


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 37


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 38


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 39


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 40


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 41


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 42


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 43


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 44


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 45


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 46


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 47


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 48


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 49


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 50


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 51


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 52


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 53


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 54


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 55


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 56


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 57


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 58


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 59


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 60


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 61


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 62


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 63


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 64


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 65


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 66


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 67


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 68


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 69


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 70


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 71


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 72


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 73


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 74


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 75


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 76


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 77


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES Page 78


STUDENTSFOCUS.COM
Sri vidya college of Engineering and Technology, virudhunagar course material(notes)

MA6451- PROBABILITY AND RANDOM PROCESSES


STUDENTSFOCUS.COM Page 79
Chapter 1

Estimation theory

In this chapter, an introduction to estimation theory is provided. The objec-


tive of an estimation problem is to infer the value of an unknown quantity,
by using information concerning other quantities (the data).
Depending on the type of a priori information available on the unknown
quantity to be estimated, two different settings can be considered:

• Paramteric estimation;

• Bayesian estimation.

Paragraphs 1.1-1.5 discuss parametric estimation problems, while paragraph


1.6 concerns the Bayesian estimation framework.

1.1 Parametric estimation


The aim of a parametric estimation problem is to estimate a deterministic
quantity θ from observations of the random variables y 1 , . . . y n .

1.1.1 Problem formulation


let:

- θ ∈ Θ ⊆ Rp , an unknown vector of parameters;

1
2 CHAPTER 1. ESTIMATION THEORY

- y = (y 1 , . . . y n )T ∈ Y ⊆ Rn a vector of random variables, hereafter


called observations or measurements;

- Fyθ (y) , fyθ (y) the cumulative distribution function and the probabil-
ity density function, respectively, of the observation vector y, which
depend on the unknown vector θ.

The set Θ, to which the parameter vector θ belongs, is referred to as


the parameter set. It represents the a priori information available on the
admissible values of the vector θ. If all values are admissible, Θ = Rp .
The set Y, containing all the values that the random vector y may take,
is known as observation set. It is assumed that the cdf Fyθ (y) (or equivalently
the pdf fyθ (y)) is parameterized by the p parameters θ ∈ Rp (which means
that such parameters enter in the expressions of those functions). Hereafter,
the word parameter will be used to denote the entire unknown vector θ. To
emphasize the special case p = 1, we will sometimes use the expression scalar
parameter.
We are now ready to formulate the general version of a parametric esti-
mation problem.

Problem 1.1. Estimate the unknown parameter θ ∈ Θ, by using an obser-


vation y of the random vector y ∈ Y.

In order to solve Problem 1.1, one has to construct an estimator.

Definition 1.1. An estimator T (·) of the parameter θ is a function that


maps the set of observations to the parameter set:

T : Y → Θ.

The value θ̂ = T (y), returned by the estimator when applied to the observa-
tion y of y, is called estimate of θ.

An estimator T (·) defines a rule that associates to each realization y of


the measurement vector y, the quantity θ̂ = T (y) which is an estimate of θ.
1.1. PARAMETRIC ESTIMATION 3

Notice that θ̂ can be seen as a realization of the random variable T (y);


in fact, since T (y) is a function of the random variable y, the estimate θ̂ is
a random variable itself.

1.1.2 Properties of an estimator


According to Definition 1.1, the class of possible estimators is infinite. In
order to characterize the quality of an estimator, it is useful to introduce
some desired properties.

Unbiasedness

A first desirable property is that the expected value of the estimate θ̂ = T (y)
be equal to the actual value of the parameter θ.

Definition 1.2. An estimator T (y) of the parameter θ is unbiased (or cor-


rect) if
Eθ [T (y)] = θ, ∀θ ∈ Θ. (1.1)

In the above definition we used the notation Eθ [·], which stresses the
dependency on θ of the expected value of T (y), due to the fact that the pdf
of y is parameterized by θ itself.
The unbiasedness condition (1.1) guarantees that the estimator T (·) does
not introduce systematic errors, i.e., errors that are not averaged out even
when considering an infinite amount of observations of y. In other words,
T (·) does not overestimate neither underestimate θ, on average (see Fig. 1.1).

Example 1.1. Let y 1 , . . . , y n be random variables with mean m. The quan-


tity
n
1X
y= y (1.2)
n i=1 i

is the so-called sample mean. It is easy to verify that y is an unbiased


estimator of m. Indeed, due to the linearity of the expected value operator,
4 CHAPTER 1. ESTIMATION THEORY

unbiased
biased

Figure 1.1: Probability density function of an unbiased estimator and of a


biased one.

one has
" n
# n n
1 X 1X 1X
E [y] = E y = E [y i ] = m = m.
n i=1 i n i=1 n i=1

Example 1.2. Let y 1 , . . . , y n be scalar random variables, independent and


identically distributed (i.i.d.) with mean m and variance σ 2 . The quantity
n
1X
σ̂y2 = (y i − y)2
n i=1

is a biased estimator of the variance σ 2 . Indeed, from (1.2) one has


 !2 
n n
  1 X 1 X
E σ̂y2 = E  yi − y
n j=1 j

n i=1
 !2 
n n
1 X 1  X
= E ny i − yj 
n i=1 n2 j=1
 !2 
n n
1X 1  X
= E n(y i − m) − (y j − m)  .
n i=1 n2 j=1
1.1. PARAMETRIC ESTIMATION 5

However,
 !2 
n
X
 = n2 E (y i − m)2
 
E  n(y i − m) − (y j − m)
j=1
" #  !2 
n
X n
X
− 2nE (y i − m) (y j − m) + E  (y j − m) 
j=1 j=1

= n2 σ 2 − 2nσ 2 + nσ 2
= n(n − 1)σ 2
 
because, for the independency assumption, E (y i − m)(y j − m) = 0 for
i 6= j. Therefore,
n
1X 1 n−1 2
σ̂y2 n(n − 1)σ 2 = σ 6= σ 2 .
 
E = 2
n i=1 n n

Example 1.3. Let y 1 , . . . , y n be i.i.d. scalar random variables, with mean


m and variance σ 2 . The quantity
n
1 X
S2 = (y − y)2
(n − 1) i=1 i

is called sample variance. It is straightforward to verify that S 2 is an unbiased


estimator of the variance σ 2 . In fact, observing that
n
S2 = σ̂y2 ,
n−1
one has immeantely
n n n−1 2
E S2 = E σ̂y2 = σ = σ2 .
   
n−1 n−1 n

Notice that, if T (·) is an unbiased estimator of θ, then g(T (·)) is not in


general an unbiased estimator of g(θ), unless g(·) is a linear function.
6 CHAPTER 1. ESTIMATION THEORY

Consistency

Another desirable property of an estimator is to provide an estimate that


converges to the actual value of θ as the number of measurements grows.
Being the estimate a random variable, we need to introduce the notion of
convergence in probability.

Definition 1.3. Let {y i }∞


i=1 be a sequence of random variables. The se-
quence of estimators θ̂n = Tn (y 1 , . . . , y n ) of θ is said to be consistent if θ̂n
converges in probability to θ, for all admissible values of θ, i.e.
 
lim P θ̂n − θ ≥ ε = 0, ∀ε > 0, ∀θ ∈ Θ.
n→∞

n = 500

n = 100
n = 50
n = 20

Figure 1.2: Probability density function of a consistent estimator.

Notice that consistency is an asymptotic property of an estimator. It


guarantees that, as the number of data goes to infinity, the probability that
the estimate differ from the actual value of the parameter goes to zero (see
Fig. 1.2).
The next Theorem provides a sufficient condition for consistency of un-
biased estimators.

Theorem 1.1. Let θ̂n be a sequence of unbiased estimators of the scalar


parameter θ: h i
E θ̂n = θ, ∀n, ∀θ ∈ Θ.
1.1. PARAMETRIC ESTIMATION 7

If h i
lim E (θ̂n − θ)2 = 0,
n→∞

then the sequence θ̂n is consistent.


Proof
For a random variable x, the Chebishev inequality holds:
1 
P (|x − mx | ≥ ε) ≤ 2 E (x − mx )2 .

ε
Therefore, one has
  1 h 2
i
lim P θ̂n − θ ≥ ε ≤ lim 2 E (θ̂n − θ) ,
n→∞ n→∞ ε

from which the result follows immediately. 


Therefore, for a sequence of unbiased estimators to be consistent, it is suffi-
cient that the variance of the estimates goes to zero as the number of mea-
surements grows.

Example 1.4. Let y 1 , . . . , y n be i.i.d. random variables with mean m and


variance σ 2 . In Example 1.1 it has been shown that the sample mean
n
1X
y= y
n i=1 i
is an unbiased estimator of the mean m. Let us now show that it is also a
consistent estimator of m. The variance of the estimate is given by
 !2 
n
1 X
Var(y) = E (y − m)2 = E 
 
y −m 
n i=1 i
 !2 
n
1 X σ2
= 2E (y i − m)  =
n i=1
n

because the random variables y i are independent. Therefore,


σ2
Var(y) = → 0 as n → ∞.
n
Hence, due to Theorem 1.1, the sample mean y is a consistent estimator of
the mean m. △
8 CHAPTER 1. ESTIMATION THEORY

The result in Example 1.4 is a special case of the following more general
celebrated result.

Theorem 1.2. (Law of large numbers)


Let {y i }∞
i=1 be a sequence of independent random variables with mean m and
finite variance. Then, the sample mean y converges to m in probability.

Mean square error

A criterion for measuring the quality of the estimate provided by an estimator


is the Mean Square Error. Let us first consider the case of a scalar parameter
(θ ∈ R).

Definition 1.4. Let θ ∈ R. The Mean Square Error (MSE) of an estimator


T (·) is defined as
MSE T (·) = Eθ (T (y) − θ)2
 

Notice that if an estimator is unbiased, then the MSE is equal to the


variance of the estimate T (y), and also to the variance of the estimation
error T (y) − θ. On the other hand, for a biased estimator one has

MSE T (·) = Eθ (T (y) − mT (y) + mT (y) − θ)2


 

= Eθ (T (y) − mT (y) )2 + (mT (y) − θ)2


 

where mT (y) = E [T (y)]. The above expression shows that the MSE of
a biased estimator is the sum of the variance of the estimator and of the
square of the deterministic quantity mT (y) − θ, which is called bias error. As
we will see, the trade off between the variance of the estimator and the bias
error is a fundamental limitation in many practical estimation problems.
The MSE can be used to decide which estimator is better within a family
of estimators.

Definition 1.5. Let T1 (·) and T2 (·) be two estimators of the parameter θ.
Then, T1 (·) is uniformly preferable to T2 (·) if

Eθ (T1 (y) − θ)2 ≤ Eθ (T2 (y) − θ)2 ,


   
∀θ ∈ Θ
1.1. PARAMETRIC ESTIMATION 9

It is worth stressing that in order to be preferable to other estimators, an


estimator must provide a smaller MSE for all the admissible values of the
parameter θ.
The above definitions can be extended quite naturally to the case of a
parameter vector θ ∈ Rp .

Definition 1.6. Let θ ∈ Rp . The Mean Square Error (MSE) of an estimator


T (·) is defined as

MSE T (·) = Eθ kT (y) − θ)k2


 

= Eθ tr{(T (y) − θ)(T (y) − θ)T }


 

where tr(M) denotes the trace of the matrix M.

The concept of uniformly preferable estimator is analogous to that in


Definition 1.5. It can be also defined in terms of inequality between the
corresponding covariance matrices, i.e., T1 (·) is uniformly preferable to T2 (·)
if
Eθ (T1 (y) − θ)(T1 (y) − θ)T ≤ Eθ (T2 (y) − θ)(T2 (y) − θ)T
   

where the matrix inequality A ≤ B means that B−A is a positive semidefinite


matrix.

1.1.3 Minimum variance unbiased estimator


Let us restrict our attention to unbiased estimators. Since we have introduced
the concept of mean square error, it is natural to look for the estimator which
minimizes this performance index.

Definition 1.7. An unbiased estimator T ∗ (·) of the scalar parameter θ is a


Uniformly Minimum Variance Unbiased Estimator (UMVUE) if

Eθ (T ∗ (y) − θ)2 ≤ Eθ (T (y) − θ)2 ,


   
∀θ ∈ Θ (1.3)

for all unbiased estimators T (·) of θ.


10 CHAPTER 1. ESTIMATION THEORY

Notice that for an estimator to be UMVUE, it has to satisfy the following


conditions:

• be unbiased;

• have minimum variance among all unbiased estimators;

• the previous condition must hold for every admissible value of the pa-
rameter θ.

Unfortunately, there are many problems for which there does not exist any
UMV UE estimator. For this reason, we often restrict the class of estimators,
in order to find the best one within the considered class. A popular choice
is that of linear estimators, i.e., taking the form
n
X
T (y) = ai y i , (1.4)
i=1

with ai ∈ R.

Definition 1.8. A linear unbiased estimator T ∗ (·) of the scalar parameter θ


is the Best Linear Unbiased Estimator (BLUE) if

Eθ (T ∗ (y) − θ)2 ≤ Eθ (T (y) − θ)2 ,


   
∀θ ∈ Θ

for every linear unbiased estimator T (·) of θ.

Differently from the UMVUE estimator, the BLUE estimator takes on a


simple form and can be easily computed (one has just to find the optimal
values of the coefficients ai ).

Example 1.5. Let y i be independent random variables with mean m and


variance σi2 , i = 1, . . . , n. Assume the variances σi2 are known. Let us
compute the BLUE estimator of m. Being the estimator linear, it takes on
the form (1.4). In order to be unbiased, T (·) must satisfy
" n # n n
X X X
Eθ [T (y)] = Eθ ai y i = ai Eθ [y i ] = m ai = m
i=1 i=1 i=1
1.1. PARAMETRIC ESTIMATION 11

Therefore, we must enforce the constraint


n
X
ai = 1 (1.5)
i=1

Now, among all the estimators of form (1.4), with the coefficients ai satisfying
(1.5), we need to find the minimum variance one. Being the observations y i
independent, the variance of T (y) is given by
 !2 
n
X n
X
Eθ (T (y) − m)2 = Eθ  a2i σi2 .
 
ai y i − m =
i=1 i=1

Summing up, in order to determine the BLUE estimator, we have to solve


the following constrained optimization problem:
n
X
min a2i σi2
ai
i=1

s.t.
n
X
ai = 1
i=1

Let us write the Lagrangian function

n n
!
X X
L(a1 , . . . , an , λ) = a2i σi2 + λ ai − 1
i=1 i=1

and compute the stationary points by imposing

∂L(a1 , . . . , an , λ)
= 0, i = 1, . . . , n (1.6)
∂ai
∂L(a1 , . . . , an , λ)
= 0. (1.7)
∂λ

From (1.7) we obtain the constraint (1.5), while (1.6) implies that

2ai σi2 + λ = 0, i = 1, . . . , n
12 CHAPTER 1. ESTIMATION THEORY

from which
1
λ=− n (1.8)
X 1
i=1
2σi2
1
σi2
ai = n , i = 1, . . . , n (1.9)
X 1
j=1
σj2

Tehrefore, the BLUE estimator of the mean m is given by


n
1 X 1
m̂BLU E = n y (1.10)
X 1 σ2 i
i=1 i
σ2
i=1 i

Notice that if all the measurements have the same variance σi2 = σ 2 , the
estimator m̂BLU E boils down to the sample mean y. This means that the
BLUE estimator can be seen as a generalization of the sample mean, in
the case when the measurements y i have different accuracy (i.e., different
variance σi2 ). In fact, the BLUE estimator is a weighted average of the ob-
servations, in which the weights are inversely proportional to the variance of
the measurements or, seen another way, directly proportional to the precision
of each observation. Let us assume that for a certain i, σi2 → ∞. This means
1
that the measurement y i is completely unreliable. Then, the weight σi2
of y i
within m̂BLU E will tend to zero. On the other hand, for an infinitely precise
1
measurement y j (σj2 → 0), the corresponding weight σj2
will be predominant
over all the other weights and the BLUE estimate will approach that mea-
surement, i.e., m̂BLU E ≃ y j . △

1.2 Cramér-Rao bound


This paragraph introduces a fundamental result which establishes a lower
bound to the variance of every unbiased estimator of the parameter θ.
1.2. CRAMÉR-RAO BOUND 13

Theorem 1.3. (Cramér-Rao bound) Let T (·) be an unbiased estimator


of the scalar parameter θ based on the observations y of the random variables
y ∈ Y, and let that the observation set Y be independent from θ. Then,
under some technical regularity assumptions (see (Rohatgi and Saleh, 2001)),
it holds
Eθ (T (y) − θ)2 ≥ [In (θ)]−1 ,
 
(1.11)

where  !2 
θ
∂ ln fyθ (y)
In (θ) = E  (1.12)
∂θ

is called Fisher information. Moreover, if the observations y 1 , . . . , y n are


independent and identically distributed with the same pdf fyθ1 (y1 ), one has

In (θ) = n I1 (θ).

When θ is a p-dimensional vector, the Cramér-Rao bound (1.11) becomes


h i
Eθ (T (y) − θ) (T (y) − θ)T ≥ [In (θ)]−1 ,

where the inequality must be intended in matricial sense and the matrix
In (θ) ∈ Rp×p is the so-called Fisher information matrix
 ! !T 
θ θ
∂ ln fy (y) ∂ ln fy (y)
In (θ) = Eθ  .
∂θ ∂θ

h i
Notice that the matrix Eθ (T (y) − θ) (T (y) − θ)T is the covariance matrix
of the unbiased estimator T (·).
Theorem 1.3 states that there does not exist any estimator with variance
smaller than [In (θ)]−1 . Notice that In (θ) depends, in general, on the actual
value of the parameter θ (because the partial derivatives must be evaluated
in θ) which is unknown. For this reason, an approximation of the lower
bound is usually computed in practice, by replacing θ with an estimate θ̂.
Nevertheless, the Cramér-Rao is also important because it allows to define
the key concept of efficiency of an estimator.
14 CHAPTER 1. ESTIMATION THEORY

Definition 1.9. An unbiased estimator T (·) is efficient if its variance achieves


the Cramér-Rao bound, i.e.

Eθ (T (y) − θ)2 = [In (θ)]−1 .


 

An efficient estimator has the least possible variance among all unbiased
estimators (therefore, it is also a UMVUE).
In the special case of i.i.d. observations y i , Theorem 1.3 states that
In (θ) = nI1 (θ), where I1 (θ) is the Fisher information of a single observa-
tion. Therefore, for a fixed θ, the Cramér-Rao bound decreases as n1 , as the
number of observations n grows.

Example 1.6. Let y 1 , . . . , y n be i.i.d. random variables with mean my and


variance σy2 . In Examples 1.1 and 1.4, we have seen that the sample mean
n
1X
y= y
n i=1 i

is a consistent unbiased estimator of the mean my . Being the observations


i.i.d., from Theorem 1.3 one has

θ
 2
 σy2 −1 [I1 (θ)]−1
E (y − my ) = ≥ [In (θ)] = .
n n
Let us now assume that the y i are distributed according to the Gaussian pdf
(y −m ) 2
1 − i 2y
fyi (yi ) = √ e 2σy
.
2πσy
Let us compute the Fisher information of a single measurement
 !2 
θ
∂ ln fy1 (y1 )
I1 (θ) = Eθ  .
∂θ

In this example, the unknown parameter to be estimated is the mean θ = m.


Therefore,
∂ ln fyθ1 (y1 ) (y1 − m)2
 
∂ 1 y − my
= ln √ − = ,
∂θ ∂m 2πσy 2σy2 m=my
σy2
1.3. MAXIMUM LIKELIHOOD ESTIMATOR 15

and hence,
(y − my )2
 
θ 1
I1 (θ) = E = .
σy4 σy2
The Cramér-Rao bound takes on the value

−1 [I1 (θ)]−1 σy2


[In (θ)] = = ,
n n
which is equal to the variance of the estimator y. Therefore, we can con-
clude that: in the case of i.i.d. Gaussian observations, the sample mean is
an efficient estimator of the mean. △

1.3 Maximum Likelihood Estimator


In general, for a given parametric estimation problem, an efficient estimator
may not exist. In Example 1.6, it has been shown that the Cramér-Rao
bound allows one to check if an estimator is efficient. However, it remains
unclear how to find suitable candidates for efficient estimators and, in the
case that such candidates turn out to be not efficient, whether it is possible
to conclude that for the problem at hand there are no efficient estimators.
An answer to these questions is provided by the class of Maximum Likelihood
estimators.

Definition 1.10. Let y be a vector of observations with pdf fyθ (y), depend-
ing on the unknown parameter θ ∈ Θ. The likelihood function is defined
as
L(θ|y) = fyθ (y) .

It is worth remarking that, once the realization y of the random variable


y has been observed (i.e., after the data have been collected), the likelihood
function depends only on the unknown parameter θ (indeed, we refer to
L(θ|y) as the likelihood of θ “given” y).
A meaningful way to estimate θ is to choose the value that maximizes the
probability of the observed data. In fact, by exploiting the meaning of the
16 CHAPTER 1. ESTIMATION THEORY

probability density function, maximizing fyθ (y) with respect to θ corresponds


to choose θ in such a way that the measurement y has the highest possible
probability of having been observed, among all feasible scenarios θ ∈ Θ.

Definition 1.11. The Maximum Likelihood (ML) estimator of the unknown


parameter θ is given by

TM L (y) = arg max L(θ|y).


θ∈Θ

In several problems, in order to ease the computation, it may be conve-


nient to maximize the so-called log-likelihood function:

ln L(θ|y).

Being the natural logarithm a monotonically increasing function, L(θ|y) and


ln L(θ|y) achieve their maxima in the same values.

Remark 1.1. Assuming that the pdf fyθ (y) be a differentiable function of
θ = (θ1 , . . . , θp ) ∈ Θ ⊆ Rp , with Θ an open set, if θ̂ is a maximum for L(θ|y),
it has to be a solution of the equations

∂L(θ|y)
= 0, i = 1, . . . , p (1.13)
∂θi θ=θ̂

or equivalently of

∂ ln L(θ|y)
= 0, i = 1, . . . , p. (1.14)
∂θi θ=θ̂

It is worth observing that in many problems, even for a scalar parameter


(p = 1), equation (1.13) may admit more than one solution. It may also
happen that the likelihood function is not differentiable everywhere in Θ or
that Θ is not an open set, in which case the maximum can be achieved on
the boundary of Θ. For all these reasons, the computation of the maximum
likelihood estimator requires to study the function L(θ|y) over the entire
domain Θ (see Exercise 1.5). Clearly, this may be a formidable task for high
dimensional parameter vectors.
1.3. MAXIMUM LIKELIHOOD ESTIMATOR 17

Example 1.7. Let y 1 , . . . , y n be independent Gaussian random variables,


with unknown mean my and known variance σy2 . Let us compute the ML
estimator of the mean my .
Being the measurements independent, the lieklihood is given by
n 2
(y −m)
Y 1 − i 2
L(θ|y) = fyθ (y) = √ e 2σy .
i=1
2πσy

In this case, it is convenient to maximize the log-likelihood, which takes on


the form
n 
(yi − m)2

X 1
ln L(θ|y) = ln √ −
i=1
2πσ y 2σy2
n
1 X (yi − m)2
= n ln √ − 2
.
2πσy i=1
2σy

By imposing the condition (1.14), one gets


n
!
∂ ln L(θ|y) ∂ 1 X (yi − m)2
= n ln √ − = 0,
∂θ ∂m 2πσy i=1
2σy2
m=m̂M L

from which
n
X yi − m̂M L
= 0,
i=1
σy2

and hence
n
1X
m̂M L = y.
n i=1 i

Therefore, in this case the ML estimator coincides with the sample mean.
Since the observations are i.i.d. Gaussian variables, this estimator is also ef-
ficient (see Example 1.6). △

The result in Example 1.7 is not restricted to the specific setting or pdf
considered. The following general theorem illustrates the importance of max-
imum likelihood estimators, in the context of parametric estimation.
18 CHAPTER 1. ESTIMATION THEORY

Theorem 1.4. Under the same assumptions for which the Cramér-Rao bound
holds, if there exists an efficient estimator T ∗ (·), then T ∗ (·) is a maximum
likelihood estimator.

Therefore, if we are looking for an efficient estimator, the only candidates


are maximum likelihood estimators.

Example 1.8. Let y 1 , . . . , y n be independent Gaussian random variables,


with mean my and variance σy2 , both unknown. Let us compute the Maxi-
mum Likelihood estimator of the mean and the variance.
Similarly to what observed in Example 1.7, the log-likelihood turns out
to be n
1 X (yi − m)2
ln L(θ|y) = n ln p − .
2πσ 2 i=1
2σ 2

The unknown parameter vector to be estimated is θ = (m, σ 2 )T , for which


condition (1.14) becomes
n
!
∂ ln L(θ|y) ∂ 1 X (yi − m)2
= n ln p − = 0,
∂θ1 ∂m 2πσ 2 i=1
2σ 2 2
(m=m̂M L ,σ2 =σ̂M L)
n
!
∂ ln L(θ|y) ∂ 1 X (yi − m)2
= n ln p − = 0.
∂θ2 ∂σ 2 2πσ 2 i=1
2σy2 2
(m=m̂M L ,σ2 =σ̂M L)

By differentiating with respect m and σ 2 , one gets


n
X yi − m̂M L
2
=0
i=1
σ̂M L
n
n 1 X
− 2
+ 4 (yi − m̂M L )2 = 0,
2σM L 2σM L i=1

from which
n
1X
m̂M L = y
n i=1 i
n
2 1X
σM = (y − m̂M L )2 .
L
n i=1 i
1.4. NONLINEAR ESTIMATION WITH ADDITIVE NOISE 19

Although Eθ [m̂M L ] = my (see Example 1.1), one has Eθ [σM


2
L] =
n−1 2
n
σy (see
Example 1.2). Therefore, in this case, the Maximum Likelihood estimator is
biased and hence it is not efficient. Due to Theorem 1.4, we can conclude that
there does not exist any efficient estimator for the parameter θ = (m, σ 2 )T . △

The previous example shows that Maximum Likelihood estimators can


be biased. However, besides the motivations provided by Theorem 1.4, there
exist other reasons that make such estimators attractive.

Theorem 1.5. If the random variables y 1 , . . . , y n are i.i.d., then (under


suitable technical assumptions)
p
lim In (θ) (TML (y) − θ)
n→+∞

is a random variable with standard normal distribution N(0, 1).

Theorem 1.5 states that the maximum likelihood estimator is:

• asymptotically unbiased;

• consistent;

• asymptotically efficient;

• asymptotically normal.

1.4 Nonlinear estimation with additive noise


A popular class of estimation problems is the one in which the aim is to esti-
mate a parameter θ, by using n measurements y = (y 1 , . . . , y n )T corrupted
by additive noise. Formally, let

h : Θ ⊆ Rp → Rn

be a deterministic function of θ. The aim is to estimate θ by using the


observations
y = h(θ) + ε
20 CHAPTER 1. ESTIMATION THEORY

where ε ∈ Rn represents the measurement noise, modeled as a vector of


random variables with pdf fε (ε).
Under this assumptions, the likelihood function is given by

L(θ|y) = fyθ (y) = fε (y − h(θ)) .

In the case in which the measurement noise ε is distributed according to


the Gaussian pdf
1 1 T −1
fε (ε) = e− 2 εΣε ε
(2π)n/2 (det Σε )1/2
with zero mean and known covariance matrix Σε , the log-likelihood function
takes on the form
1
ln L(θ|y) = K − (y − h(θ))T Σ−1
ε (y − h(θ)),
2
where K is a constant that does not depend on θ. The computation of
the maximum likelihood estimator boils down to the following optimization
problem

θ̂M L = arg max ln L(θ|y)


θ

= arg min(y − h(θ))T Σ−1


ε (y − h(θ)). (1.15)
θ

Being h(·), in general, a nonlinear function of θ, the solution of (1.15) can


be computed by resorting to numerical methods. Clearly, the computational
complexity depends not only on the number p of parameters to be estimated
and on the size n of the data set, but also on the structure of h(·). For
example, if h(·) is convex there are efficient algorithms that allow to solve
problems with very large n and p, while if h(·) is noncovex the problem may
become intractable even for relatively small values of p.

1.5 Linear estimation problems


An intersting scenario is the one in which the relationship between the un-
known parameters and the data is linear, i.e. h(θ) = U θ, where U ∈ Rn×p .
1.5. LINEAR ESTIMATION PROBLEMS 21

In this case, the measurement equation takes on the form

y = Uθ + ε. (1.16)

In the following, we will assume that rank(U) = p, which means that the
number of linearly independent measurements is not smaller than the number
of parameters to be estimated (otherwise, the problem is ill posed).
We now introduce two popular estimators that can be used to estimate
θ in the setting (1.16). We will discuss their properties, depending on the
assumptions we make on the measurement noise ε. Let us start with the
Least Squares estimator.

Definition 1.12. Let y be a vector of random variables related to θ according


to (1.16). The estimator

TLS (y) = (U T U)−1 U T y (1.17)

is called Least Squares (LS) estimator of the parameter θ.

The name of this estimator comes from the fact that it minimizes the
sum of the squared differences between the data realization y and the model
Uθ, i.e.
θ̂LS = arg min ky − Uθk2 .
θ

Indeed,

ky − Uθk2 = (y − Uθ)T (y − Uθ) = y T y + θT U T Uθ − 2y T Uθ.

By differentiating with respect to θ, on gets


ky − Uθk2 T
= 2θ̂LS U T U − 2y T U = 0,
∂θ θ=θ̂LS

∂xT Ax ∂Ax
where the properties ∂x
= 2xT A and ∂x
= A have been exploited. By
T
solving with respect to θ̂LS , one gets

T
θ̂LS = y T U(U T U)−1 .
22 CHAPTER 1. ESTIMATION THEORY

Finally, by transposing the above expression and taking into account that
the matrix (U T U) is symmetric, one obtains the equation (1.17).
It is worth stressing that the LS estimator does not require any a priori
information about the noise ε to be computed. As we will see in the sequel,
however, the properties of ε will influence those of the LS estimator .

Definition 1.13. Let y be a vector of random variables related to θ according


to (1.16). Let Σε be the covariance matrix of ε. The estimator:

TGM (y) = (U T Σ−1 −1 T −1


ε U) U Σε y (1.18)

is called Gauss-Markov (GM) estimator (or Weighted Least Squares Estima-


tor) of the parameter θ.

Similarly to what has been shown for the LS estimator, it is easy to verify
that the GM estimator minimizes the weighted sum of squared errors between
y and Uθ, i.e.
θ̂GM = arg min(y − Uθ)T Σ−1
ε (y − Uθ).
θ

Notice that the Gauss-Markov estimator requires the knowledge of the co-
variance matrix Σε of the measurement noise. By using this information, the
measurements are weighted with a matricial weight that is inversely propor-
tional to their uncertainty.
Under the assumtpion that the noise has zero mean, E [ε] = 0, it is easy
to show that both the LS and the GM estimator are unbiased. For the LS
estimator one has
h i
Eθ θ̂LS = Eθ (U T U)−1 U T y = Eθ (U T U)−1 U T (Uθ + ε)
   

= Eθ θ + (U T U)−1 U T ε = θ.
 

For the GM estimator,


h i
Eθ θ̂GM = Eθ (U T Σ−1 −1 T −1 θ
   T −1 −1 T −1 
ε U) U Σε y = E (U Σε U) U Σε (Uθ + ε)

= Eθ θ + (U T Σ−1 −1 T −1
 
ε U) U Σε ε = θ.
1.5. LINEAR ESTIMATION PROBLEMS 23

If the noise vector ε has non-zero mean, mε = E [ε], but the mean mε
is known, the LS and GM estimators can be easily amended to remove the
bias. In fact, if we define the new vector of random variables ε̃ = ε − mε ,
the equation (1.16) can be rewritten as

y − mε = Uθ + ε̃, (1.19)

and being clearly E [ε̃] = 0, E [ε̃ε̃′ ] = Σε , all the treatment can be repeated
by replacing y with y − mε . Therefore, the expressions of the LS and GM
estimators remain those in (1.17) and (1.18), with y replaced by y − mε .
The case in which the mean of ε is unknown is more intriguing. In some
cases, one may try to estimate it from the data, along with the parameter θ.
Assume for example that E [εi ] = m̄ε , ∀i. This means that E [ε] = m̄ε · 1,
where 1 = [1 1 ... 1]T . Now, one can define the extended parameter
vector θ̄ = [θ′ m̄ε ]T ∈ Rp+1 , and use the same decomposition as in (1.19) to
obtain
y = [U 1]θ̄ + ε̃
Then, one can apply the LS or GM estimator, by replacing U with [U 1], to
obtain a simultaneous estimate of the p parameters θ and of the scalar mean
m̄ε .

An important property of the Gauss-Markov estimator is that of being


the minimum variance estimator among all linear unbiased estimators, i.e.,
the BLUE (see Definition 1.8). In fact, the following result holds.

Theorem 1.6. Let y be a vector of random variables related to the param-


eter θ according to (1.16). Let Σε be the covariance matrix of ε. Then, the
BLUE estimator of θ is the Gauss-Markov estimator (1.18).The correspond-
ing variance of the estimation error is given by
h i
E (θ̂GM − θ)(θ̂GM − θ)T = (U T Σ−1 −1
ε U) . (1.20)

In the special case Σε = σε2 In (with In identity matrix of dimension n), i.e.,
when the variables ε are uncorrelated and have the same variance σε2 , the
BLUE estimator is the Least Squares estimator (1.17).
24 CHAPTER 1. ESTIMATION THEORY

Proof
Since we consider the class of linear unbiased estimators, we have T (y) = Ay,
and E [Ay] = AE [y] = AUθ. Therefore, one must impose the constraint
AU = Ip to guarantee that the estimator is unbiased.
In order to find the minimum variance estimator, it is necessary to minimize
(in matricial sense) the covariance of the estimation error

E (Ay − θ)(Ay − θ)T = E (AUθ + Aε − θ)(·)T


   

= E AεεT AT
 

= AΣε AT

where we have enforced the constraint AU = Ip in the second equality. Then,


the BLUE estimator is obtained by solving the constrained optimization
problem
ABLU E = arg min AΣε AT
A
s.t. (1.21)
AU = Ip
and then setting T (y) = ABLU E y.
Being the constraint AU = Ip linear in the matrix A, it is possible to param-
eterize all the admissible solutions A as

A = (U T Σ−1 −1 T −1
ε U) U Σε + M (1.22)

with M ∈ Rp×n such that MU = 0. It is easy to check that all matrices A


defined by (1.22) satisfy the constraint AU = Ip . It is therefore sufficient to
find the one that minimizes the quantity AΣε AT . By substituting A with
the expression (1.22), one gets

AΣε AT = (U T Σ−1 −1 T −1 −1 T −1
ε U) U Σε Σε Σε U(U Σε U)
−1

+(U T Σ−1 −1 T −1
ε U) U Σε Σε M
T

+MΣε Σ−1 T −1
ε U(U Σε U)
−1
+ MΣε M T
= (U T Σ−1
ε U)
−1
+ MΣε M T
≥ (U T Σ−1
ε U)
−1
1.5. LINEAR ESTIMATION PROBLEMS 25

where the second equality is due to MU = 0, while the final inequality


exploits the fact that Σε is positive definite and hence MΣε M T is posi-
tive semidefinite. Since the expression (U T Σ−1
ε U)
−1
does not dipend on M,
we can conclude that the solution of problem (1.21) is obtained by setting
M = 0 in (1.22), which amounts to choosing ABLU E = (U T Σ−1 −1 T −1
ε U) U Σε .
Therefore, the BLUE estimator coincides with the Gauss-Markov one. The
expression of the covariance of the estimation error (1.20) is obtained from
AΣε AT when M = 0.
Finally, if Σε = σε2 In one has ABLU E = (U T U)−1 U T (whatever is the value
of σε2 ) and hence the GM estimator boils down to the LS one. 

In Section 1.4 it has been observed that, if the measurement noise ε is


Gaussian, the Maximum Likelihood estimator can be computed by solving
the optimization problem (1.15). If the observations depend linearly on θ, as
in (1.16), such a problem becomes

θ̂M L = arg min(y − Uθ)T Σ−1


ε (y − Uθ). (1.23)
θ

As it has been noticed after Definition 1.13, the solution of (1.23) is actu-
ally the Gauss-Markov estimator. Therefore, we can state that: in the case
of linear observations corrupted by additive Gaussian noise, the Maximum
Likelihood estimator coincides with the Gauss-Markov estimator. Moreover,
it is possible to show that in this setting
 ! !T 
θ θ
∂ ln fy (y) ∂ ln fy (y)
Eθ   = U T Σ−1 U
∂θ ∂θ

and hence the Gauss-Markov estimator is efficient (and UMVUE).


Finally, if the measurements are also independent and have the same
variance σε2 , i.e., being the noise Gaussian,

ε ∼ N(0, σε2 In ),

the, the GM estimator boils down to the LS one. Therefore: in the case
of linear observations, corrupted by independent and identically distributed
26 CHAPTER 1. ESTIMATION THEORY

Gaussian noise, the Maximum Likelihood estimator coincides with the Least
Squares estimator.

The following table summarizes the properties of the GM and LS estima-


tors, depending on the assumptions made on the noise ε.

Assumptions on ε Properties GM Properties LS


none arg minθ (y − U θ)T Σ−1
ε (y − U θ) arg minθ ky − U θk2
with known Σε
E [ε] known unbiased unbiased
E [ε] = mε BLUE BLUE
h i
E (ε − mε )(ε − mε )T = Σε if Σε = σε2 In
ε ∼ N (mε , Σε ) ML estimator ML estimator
efficient, UMVUE if Σε = σε2 In

Table 1.1: Properties of GM and LS estimators.

Example 1.9. On an unknown parameter θ, we collect n measurements

yi = θ + vi , i = 1, . . . , n

where the vi are realizations of n random variables v i , independent, with


zero mean and variance σi2 , i = 1, . . . , n.
It is immediate to verify that the measurements yi are realizations of random
variables y i , with mean θ and variance σi2 . Therefore, the estimate of θ can
be cast in terms of the estimate of the mean of n random variables (see
Examples 1.1 and 1.5, and Exercise 1.1).

1.6 Bayesian Estimation


In the Bayesian estimation setting, the quantity to be estimated is not deter-
ministic, but it is modeled as a random variable. In particular, the objective
is to estimate the random variable x ∈ Rm , by using observations of the ran-
dom variable y ∈ Rn (we will denote the unknown variable to be estimated
1.6. BAYESIAN ESTIMATION 27

by x instead of θ, to distinguish between the parametric and the Bayesian


framework). Clearly, the complete knowledge on the stochastic relationship
between x and y is given by the joint pdf fx,y (x, y).
As in the parametric setting, the aim is to find an estimator x̂ = T (y),
where
T (·) : Rn → Rm

Definition 1.14. In the Bayesian setting, an estimator T (·) is unbiased if

E [T (y)] = E [x] .

Similarly to what has been done in parametric estimation problems, it is


necessary to introduce a criterion to evaluate the quality of an estimator.

Definition 1.15. We define Bayes risk function the quantity


Z +∞ Z +∞
Jr = E [d(x, T (y))] = d(x, T (y))fx,y (x, y) dxdy
−∞ −∞

where d(x, T (y)) denotes the distance between x and its estimate T (y),
according to a suitable metric.

Since the distance d(x, T (y)) is a random variable, the aim is to minimize
its expected value, i.e. to find

T ∗ (·) = arg min Jr .


T (·)

1.6.1 Minimum Mean Square Error Estimator


A standard choice for the distance d(·) is the quadratic error

d(x, T (y)) = kx − T (y)k2 .

Definition 1.16. The minimum Mean Square Error (MSE) estimator is


defined as x̂M SE = T ∗ (·), where

T ∗ (·) = arg min E kx − T (y)k2 .


 
(1.24)
T (·)
28 CHAPTER 1. ESTIMATION THEORY

Notice that in (1.24), the expected value is computed with respect to


both random variables x and y, and hence it is necessary to know the joint
pdf fx,y (x, y).
The following fundamental result provides the solution to the minimum
MSE estimation problem.

Theorem 1.7. Let x be a random variable and y a vector of observations.


The minimum MSE estimator x̂M SE of x based on y is equal to the condi-
tional expected value of x given y:

x̂M SE = E [x|y] .

The previous result states that the estimator minimizing the MSE is the
a posteriori expected value of x, given the observation of y, i.e.
Z +∞
x̂M SE = xfx|y (x|y)dx. (1.25)
−∞

which is indeed a function of y.


Since it is easy to prove that

E [E [x|y]] = E [x] ,

one can conclude that the minimum MSE estimator is always unbiased.
The minimum MSE estimator has other attractive properties. In partic-
ulare, if we consider the matrix

Q(x, T (y)) = E (x − T (y))(x − T (y))T ,


 

it can be shown that:

• x̂M SE is the estimator minimizing (in matricial sense) Q(x, T (y)), i.e.

Q(x, x̂M SE ) ≤ Q(x, T (y)), ∀ T (y)

• x̂M SE minimizes every monotonically increasing scalar function of Q(x, T (y)),


like for example the trace of Q, corresponding to the MSE, E kx − T (y)k2 .
 
1.6. BAYESIAN ESTIMATION 29

The computation of the minimum MSE estimator may be difficult, or


even intractable, in practical problems, because it requires the knowledge of
the joint pdf fx,y (x, y) and the computation of the integral (1.25).

Example 1.10. Consider two random variables x and y, whose joint pdf is
given by

− 3 x2 + 2xy if 0 ≤ x ≤ 1, 1≤y≤2
2
fx,y (x, y) =
0 elsewhere

Let us find the minimum MSE estimator of x based on one observation of y.


From Theorem 1.7, we know that
Z +∞
x̂M SE = xfx|y (x|y)dx.
−∞

First, we need to compute


fx,y (x, y)
fx|y (x|y) = .
fy (y)
The marginal pdf of y can be calculated from the joint pdf as
Z 1
3
fy (y) = − x2 + 2xydx
0 2
x=1
x3 1
=− + yx2 =y− .
2 x=0 2
Hence, the conditional pdf is given by
 3 2
 − 2 x +2xy if 0 ≤ x ≤ 1, 1≤y≤2
y− 12
fx|y (x|y) =
0 elsewhere

Now, it is possible to compute the minimum MSE estimator


− 3 x2 + 2xy
Z 1
x̂M EQM = x 2 dx
0 y − 12
 x=1 2
y − 83

1 3 4 2 3 3
= − x + x y = .
y − 21 8 3 x=0 y − 12

30 CHAPTER 1. ESTIMATION THEORY

1.6.2 Linear Mean Square Error Estimator


We now restrict our attention to the class of affine linear estimators

T (y) = Ay + b (1.26)

in which the matrix A ∈ Rm×n and the vector b ∈ Rm are the coefficients of
the estimator to be determined. Among all estimators of the form (1.26), we
aim at finding the one minimizing the MSE.

Definition 1.17. The Linear Mean Square Error (LMSE) estimator is de-
fined as x̂LM SE = A∗ y + b∗ , where

A∗ , b∗ = arg min E kx − Ay − bk2 .


 
(1.27)
A,b

Theorem 1.8. Let x be a random variable and y a vector of observations,


such that
E [x] = mx , E [y] = my
 ! !T  !
x − mx x − mx Rx Rxy
E =
T
.
y − my y − my Rxy Ry
Then, the solution of problem (??) is given by

A∗ = Rxy Ry−1 ,
b∗ = mx − Rxy Ry−1 my .

and hence, the LMSE estimator x̂LM SE of x is given by

x̂LM SE = mx + Rxy Ry−1 (y − my ).


Proof
First, observe that the cost to be minimized is

E kx − Ay − bk2 = tr E (x − Ay − b)(x − Ay − b)T .


   

Since the trace is a monotonically increasing function, solving problem (??)


is equivalent to find A∗ , b∗ such that

E (x − A∗ y − b∗ )(x − A∗ y − b∗ )T ≤ E (x − Ay − b)(x − Ay − b)T ∀A, b


   

(1.28)
1.6. BAYESIAN ESTIMATION 31

Therefore, by denoting the estimation error as x̃ = x − Ay − b, one gets

E x̃x̃T = E [(x − mx − A(y − my ) + mx − Amy − b)


 
i
× (x − mx − A(y − my ) + mx − Amy − b)T

= Rx + ARy AT − Rxy AT − ARyx


+ (mx − Amy − b)(mx − Amy − b)T
= Rx + ARy AT − Rxy AT − ARxy
T
+ Rxy Ry−1 Rxy
T
− Rxy Ry−1 Rxy
T

+ (mx − Amy − b)(mx − Amy − b)T


T
= Rx − Rxy Ry−1 Rxy
T
+ Rxy Ry−1 − A Ry Rxy Ry−1 − A


+ (mx − Amy − b)(mx − Amy − b)T . (1.29)

Observe that the last two terms of the previous expression are positive
semidefinite matrices. Hence, the solution of problem (1.28) is obtained by
choosing A∗ , b∗ such that the last two terms are equal to zero, i.e.

A∗ = Rxy Ry−1 ;
b∗ = mx − Amy = mx − Rxy Ry−1 my .

This concludes the proof. 

The LMSE estimator is unbiased because the expected value of the esti-
mation error is equal to zero. In fact,

E [x̃] = E [x − x̂LM SE ] = mx − E mx + Rxy Ry−1 (y − my )


 

= mx − mx + Rxy Ry−1 E [y − my ] = 0.

By setting A = A∗ and b = b∗ in the last expression in (1.29), we can


compute the variance of the estimation error of the LMSE estimator, which
is equal to
T
E x̃x̃T = Rx − Rxy Ry−1 Rxy
 
.
It is worth noting that, by interpreting Rx as the a priori uncertainty on
x, Rx − Rxy Ry−1 Rxy
T
represents the new uncertainty on x after having ob-
served the measurement y. Since the matrix Rxy Ry−1 Rxy
T
is always positive
32 CHAPTER 1. ESTIMATION THEORY

semidefinite, the effect of the observations is that to reduce the uncertainty


on x. Moreover, such a reduction depends on the size of Rxy , i.e., on the cor-
relation between the measurement y and the unknown x (notice that there
is no uncertainty reduction when Rxy = 0, as expected).
It is worth stressing that in order to compute the LMSE estimator it is
not necessary to know the joint pdf fx,y (x, y), but only the first and second
order statistics mx , my , Rx , Ry , Rxy .
An interesting property of the LMSE estimator is that the estimation
error x̃ is uncorrelated to the observations y. In fact, one has

E x̃y T = E x − mx − Rxy Ry−1 (y − my ) y T


    
(1.30)
= Rxy − Rxy Ry−1 Ry = 0.
This result is often known as orthogonality principle. Conversely, it is pos-
sible to show that if a linear estimator satisfies the orthogonality condition
E x̃y T = 0, then it is the LMSE estimator.
 

In the case in which the random variables x, y are jointly Gaussian, with
mean and covariance matrix defined as in Theorem 1.8, we recall that the
conditional expected value of x given the observation of y is given by

E [x|y] = mx + Rxy Ry−1 (y − my ).

Therefore, we can conclude that: if x, y are Gaussian variables, the LMSE


estimator coincides with the minimum MSE estimator, i.e., x̂M SE = x̂LM SE .
In other words, in the Gaussian case the minimum MSE estimator is a linear
function of the observations y.

Example 1.11. Let y 1 , y 2 be two noisy observations of the scalar random


variable x, having mean mx and variance σx2 :

y 1 = x + ε1 ,
y 2 = x + ε2 .

Let ε1 , ε2 be two independent random variables, with zero mean and variance
σ12 , σ22 , respectively. Under the assumption that x and εi , i = 1, 2, are
independent, we aim at computing the LMSE estimator of x.
1.6. BAYESIAN ESTIMATION 33

Define the vectors y = (y 1 y 2 )T and ε = (ε1 ε2 )T , and rewrite the


measurement equations in the form

y = 1 x + ε,

where 1 = (1 1)T .
First, let us compute the mean of y

E [y] = E [1 x + ε] = 1 mx

In order to find the estimate x̂LM SE , we have to compute the covariance


matrices Rxy and Ry . We get
h i
Rxy = E (x − mx ) (1 (x − mx ) + ε)T = σx2 1T ,

because x and ε are uncorrelated. Moreover,


h i
T
Ry = E (1 (x − mx ) + ε) (1 (x − mx ) + ε)

= 1 σx2 1T + Rε ,

where !
σ12 0
Rε =
0 σ22

is the covariance matrix of ε. Finally, let us compute the inverse of the


measurement covariance matrix
" ! !#−1
1 1 σ12 0
Ry−1 = σx2 +
1 1 0 σ22
!−1
σx2 + σ12 σx2
=
σx2 σx2 + σ22
!
1 σx2 + σ22 −σx2
= 2 2 .
σx (σ1 + σ22 ) + σ12 σ22 −σx2 σx2 + σ12
34 CHAPTER 1. ESTIMATION THEORY

Hence, the LMSE estimator is given by

x̂LM SE = mx + Rxy Ry−1 (y − 1 mx )


= mx + σx2 1T Ry−1 (y − 1 mx )
! !
σx2 σx2 + σ22 −σx2 y 1 − mx
= mx + 2 2 (1 1)
σx (σ1 + σ22 ) + σ12 σ22 −σx2 σx2 + σ12 y 2 − mx
!
1 2 2 y 1 − mx
= mx + 2
σ σ 2 (σ2 σ 1 )
σ12 + σ22 + σ1 2 2 y 2 − mx
x

σ22 y 1 + σ12 y 2 − mx (σ12 + σ22 )


= mx + σ12 σ22
σ12 + σ22 + σx 2

mx σ12 σ22 mx 1 1
σx2 + σ22 y 1 + σ12 y 2 2
σx
+ y
σ12 1
+ y
σ22 2
= σ12 σ22
= σ12 +σ22
1
σ12 + σ22 + σx2 σx2 + σ12 σ22
mx 1 1
2
σx
+ y +
σ12 1 σ22 2
y
= 1 1 1 .
σx2 + σ2 + σ22
1

Notice that each measurement is weighted with a weight that is inversely


proportional to the variance of the noise affecting the measurement. More-
over, the a priori information on x (i.e., its mean mx and variance σx2 ), is
treated as an additional observation of x. In particular, it is interesting to
observe that if σx2 → +∞ (i.e., the a priori information on x is completely
unreliable), the estimate x̂LM SE takes on the same form of the Gauss-Markov
estimate of the mean mx (see Example 1.5 and Exercise 1.1). This highlights
the relationship between Bayesian and parametric estimation. △

1.7 Exercises

1.1. Verify that in the problem of Example 1.9, the LS and GM estimators
of θ coincide respectively with y in (1.2) and m̂BLU E in (1.10).
1.7. EXERCISES 35

1.2. Let d1 , d2 be two i.i.d. random variables, with pdf given by


(
θe−θδ se δ ≥ 0
f (δ) =
0 se δ < 0
Let δ1 , δ2 be the available observations of d1 , d2 . Find the Maximum Like-
lihood estimator of θ.

1.3. Let d1 , d2 be independent Gaussian random variables such that

E [d1 ] = m, E [d2 ] = 3m, E (d1 − m)2 = 2, E (d2 − 3m)2 = 4


   

Let δ1 , δ2 be the available observations of d1 , d2 . Find:

a) the minimum variance estimator of m among all linear unbiased esti-


mators;

b) the variance of such an estimator;

c) the Maximum Likelihood estimator (is it different from the estimator


in item a)?).

1.4. Two measurements are available on the unknown quantity x:


y 1 = x + d1
y 2 = 2x + d2
where d1 and d2 are independent disturbances modeled as random variables
with pdf (
λe−λδ se δ ≥ 0
f (δ) =
0 se δ < 0
a) Find the Maximum Likelihood estimator of x.

b) Determine if the ML estimator is unbiased.

1.5. Let x and y be two random variables with joint pdf


1

 3 (3x + y) 0 ≤ x ≤ θ, 0 ≤ y ≤ θ

fx,y (x, y) = 2θ

 0 elsewhere
where θ is a real parameter.
36 CHAPTER 1. ESTIMATION THEORY

a) Assume θ = 1 and suppose that an observation y of the random variable


y is available. Compute the minimum MSE estimator x̂M SE of x, based
on the observation y.

b) Assume θ is unknown and suppose that an observation y of the random


variable y is available. Compute the ML estimator θ̂M L of the param-
eter θ, based on the measurement y. Establish if such an estimator is
unbiased.

c) Assume θ is unknown and suppose that two observations x and y of the


random variables x and y are available. Compute the ML estimator
θ̂M L of the parameter θ, based on the measurements x and y.

1.6. Let θ ∈ [−2, 2] and consider the function



 θx + 1 − θ if x ∈ [0, 1]
θ
f (x) = 2
 0 elsewhere

a) Show that for all θ ∈ [−2, 2], f θ is a probability density function.

b) Let y be a random variable with pdf f θ . Compute mean and variance


of y as functions of θ.

c) Compute the Maximum Likelihood estimator of θ, based on an obser-


vation y of the random variable y.

d) Let y 1 , . . ., y n be n random variables, each one distributed according


to the pdf f θ , and consider the estimator
n
!
1X 1
T (y 1 , . . . , y n ) = 12 yk −
n k=1 2

Show that T (·) is an unbiased estimator of θ.

e) Find the variance of the estimation error for the estimator T (·) defined
in item d), in the case n = 1. Compute the Fisher information I1 (θ)
and show that the inequality (1.11) holds.
1.7. EXERCISES 37

1.7. Let a and b be two unknown quantities, for which we have three different
measurements:
y1 = a + v1
y2 = b + v2
y3 = a + b + v3
where v i , i = 1, 2, 3, are independent random variables, with zero mean.
Let E [v 21 ] = E [v 23 ] = 1 and E [v 22 ] = 21 . Find:

a) The LS estimator of a and b;

b) The GM estimator of a and b;


h i
c) The variance of the estimation error E (a − â)2 + (b − b̂)2 , for the
estimators computed in items a) and b).

Compare the obtained estimates with those one would have if the observation
y 3 were not available. How does the variance of the estimation error change?

1.8. Consider two random variables x and y, whose joint pdf is

− 23 x2 + 2xy
(
0 ≤ x ≤ 1, 1 ≤ y ≤ 2
fx,y (x, y) =
0 elsewhere

Find the LMSE estimate x̂LM SE of x, based on one observation of y.


Plot the estimate x̂LM SE computed above and the minimum MSE estimate
x̂M SE derived in Example 1.10, as functions of y (the realization of y).
Compute the expected values of both estimates and compare them with the
a priori mean E [x].

1.9. Let x and y be two random variables with joint pdf



 1 (x + y)e−y 0 ≤ x ≤ 4, y ≥ 0

fx,y (x, y) = 12
 0

elsewhere

Assume that an observation y of y is available.

a) Find the estimators x̂M SE and x̂LM SE of x, and plot them as functions
of the observation y.
38 CHAPTER 1. ESTIMATION THEORY

b) Compute the MSE of the estimators obtained in item a) [Hint: use


MATLAB to compute the integrals].

1.10. Let X be an unknown quantity and assume the following measurement


is available  
1
y = ln +v
X
(
e−v v ≥ 0
where v is a random variable, whose pdf is given by fv (v) = .
0 v<0

a) Find the Maximum Likelihood estimator of X. Establish if it is biased


or not. Is it possible to find an unbiased estimator of X?

b) Assume that X is a random variable independent from v, whose a


priori pdf is given by
(
1 0≤x≤1
fX (x) = .
0 altrimenti

Find the MSE and LMSE estimators of X.

c) Plot the estimates obtained in items a) e b) as functions of y.


Bibliography

Rohatgi, V. K. and A. K. Md. E. Saleh (2001). An introduction to probability


and statistics. 2nd ed.. Wiley Interscience.

39
STUDENTSFOCUS.COM

UNIT-III
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
UNIT-IV
Unit –II

42.1 Analysis of Variance


The analysis of variance is a powerful statistical tool tests of
significance. The test of significance based on t-distribution is an adequate
procedure only for testing the significance of the difference between two. In a
situation when we have three or more samples to consider at a time an
alternative procedure is needed for testing the hypothesis that all the samples
are drawn from the same population, i.e., they have the same mean. For
example, five fertilizers are applied to four plots each of wheat and yield of
wheat on each of the plot is given. We may be interested in finding out
whether the effect of these fertilizers on the yield is significantly different or in
other words, whether the samples have come from the same normal
population. The answer to this problem is provided by the technique of
analysis of variance. The basic purpose of the analysis of variance is test the
homogeneity of several means.

42.2 Cochran’s Theorem


Let X1, X2, ....,Xn, denote a random sample from normal population
𝑁 (0, 𝜎 ). Let the sum of the squares of these values be written in the form:
2

n
 X i2 = Q1 + Q2 + ⋯ . +Qk
i 1
Where Qj is a quadratic from in X1, X2, ....,Xn, with rank (degrees of
freedom) rj: j= 1, 2, ...., k. Then the random variables Q1, Q2, ...., Qk are
mutually independent and 𝑄𝑗 𝜎 2 is 𝜒 2 -variate with rj degrees of freedom if and
only if

 rj  n .
j 1

42.3 Completely Randomised Design(CRD)


In this design the experimental units are allotted at random to the
treatments, so that every unit gets the same chance of receiving every
treatment.

For example
24
Let there be five treatments each to be replicated four times. There are,
therefore, 20 plots. Let these plots be numbered from 1 to 20 conveniently.

When a coin is tossed, there are two events, that is, either the head comes up,
or the tail. We denote the “head” by H and the “tail” by T.

Layout of CRD

1 2 3 4
A C A D
5 6 7 8
B D B D
9 10 11 12
C B C A
13 14 15 16
B D A C

Advantages of CRD

i) It is easy to layout the design.


ii) It results in the maximum use of the experimental units since all the
experimental materials can be used.
iii) It allows complete flexibility as any number of treatments and
replicates may be used. The number of replicates , if desired, can be
varied from treatment to treatment.
iv) The statistical analysis is easy even if the number of replicates are not
the same for all treatments
v) It provides the maximum number of degrees of freedom for the
estimation of the error variance, which increases the sensitivity or the
precision of the experiment for small experiments.
Disadvantages of CRD

i) In certain circumstances, the design suffers from the disadvantage of


being inherently less informative than other more sophisticated
layouts. This usually happens if the experimental material is not
homogeneous.
ii) Since randomisation is not restricted in any direction to ensure that
the units receiving one treatment are similar to those of receiving
other treatment, the whole variations among the experimental units is
included in the residual variance.

25
iii) This makes the design less efficient and results in less sensitivity in
detecting significant effects.
Applications

Completely randomized design is most useful in laboratory technique and


methodological studies, e.g., in physics, chemistry, in chemical and
biological experiments , in some green house studies, etc.,

2.3.1 Statistical Analysis of CRD

The model is

y ij    t i  eij i  1, 2, ...,k ; j  1, 2, ...ni

Where yij is the yield

µ is the general mean effect

ti is the treatment effect

eij is the error term mean zero and variance σ2

E(yij) = µ+ti , i= 1, 2 …,k can be estimated by method of least square that


is minimizing error sum of square

E (ee) 2   yij  E ( yij ) 2  


ij

  yij  (   ti ) 
2

ij

(ee)
0


2 yij    t i (1)  0
ij

 2 yij    t i   0
ij

 y    ti  
0
0
2
ij
ij

Where y
i
ij  G , G= Grand total

26
G   ni    ni t i … (1)
i i

 (ee)
0
t i

2 yij    t i (1)  0  2 yij    t i   0


j j

 y    ti  
0
0
2
ij
j

 y     t
j
ij
j j
i 0

Where y j
ij  Ti

Ti  ni   ni t i  0 …(2)

From equation (1)

n t
i
i i  0, n
i
i n

G  nˆ  0

G
 ̂
n

From equation (2)

Ti  ni ̂  ni t i

Ti ni G
  tˆi
ni ni n

Ti G
  tˆi
ni n

27
Error Sum of Squares

E ee     y ij    t i 
2 2

ij

 y
ij
ij    t i yij    t i 

 y y
ij
ij ij    t i   other terms are vanished

 y
ij
2
ij  ˆ yij tˆ i yij 

  yij2  ˆ  yij  tˆi  yij


ij ij ij

  yij2  ˆ  yij   tˆi  yij


ij ij i j

G T TG
=  yij2  G    i  i 
ij n i  ni n 

G 2  Ti 2 G 2 
= ( y    2
 
n  i ni n 
ij
ij

Where y j
ij  Ti

Error Sum of Square (E.S.S) = Total Sum of Square (T.S.S) – Treatment


Sum of Square (Tr. S.S)

G2
Where is the correction factor
n

Table 2.1 : Anova Table for CRD

Source of d.f Sum of Mean Sum of F-ratio


variation Square(SS) Square(MSS)

Treatments k-1 Tr.S.S= Tr.S .S MSST


MSST= F
Ti 2 G 2 k 1 MSSE
i n  n
i

28
Error n-k By Er.S .S
MSSE=
subtraction nk

E.S.S=T.S.S-
Tr.S.S

Total n-1 G2
 yij2 
ij n

Under the null hypothesis, H 0  t1  t 2    t k against the alternative that


MSST
all t‟s are not equal, the test statistic F  ~ Fk 1, n  k 
MSST

i.e., F follows F (central) distribution with (k-1, n-k) d.f

If F > F(k-1, n-k) (α) then H0 is rejected at α % level of significance and we


conclude that treatments differ significantly. Otherwise H0 accepted.

Problem 2.1 :

A set of data involving four “tropical feed stuffs A, B, C, D” tried on 20 chicks


is given below. All the twenty chicks are treated alike in all respects except the
feeding treatments and each feeding treatment is given to 5 chicks. Analyse
the data.

Feed Gain in Weight Total Tᵢ

A 55 49 42 21 52 219
B 61 112 30 89 63 355
C 42 97 81 95 92 407
D 169 137169 85 154 714

Grand Total G = 1,695

Figures in antique in the Table are not given in the original data. They are a
part of the calculations for analysis.
Weight gain of baby chicks fed on different feeding materials composed of
tropical feed stuffs is given in Table.

29
Solution:
Null hypothesis, Hₒ: tA = tB = tC = tD
i.e., the treatment effects are same. In other words, all the treatments (A, B, C,
D) are alike as regards their effect on increase in weight.
Alternative hypothesis, H₁: At least two of tᵢ‟s are different.
Raw S.S. (R.S.S.) =  yi j
2
ij = 55² + 49² +……..+ 85² + 154² = 1, 81,445

Correction factor (C.F.) = G²/N = (1,695)²/20 = 1, 43,651.25


Total S.S. (T.S.S.) = R.S.S. – C.F. = 1, 81.445 – 1, 43,651.25 = 37, 793.75
Treatment S.S. = T₁² + T₂² +T₃² + T₄²/5 – C.F.
= 47, 961 + 1, 26,025 + 1, 65, 649 + 5, 09,769/5 – 1, 43,641.25
Error S.S. = Total S.S. – Treatment S.S. = 37,793.75 – 26,234.95 = 11,558.80

Table 3.2 : Anova Table for CRD


Source of M.S.S. = Variance ratio, „F‟
variation S.S. d.f. S.S./d.f.

Treatments 26,234.95 3 8744.98 FT = 8744.98/722.42 = 12.105


Error 11,558.80 16 722.42
Total 37,793.75 19

Test statistic: FT ̴ F(3,16), Tabulated Fₒ.ₒ₅ (3, 16) = 3.06. Hence FT is highly
significant and we rejected Hₒ at 5% level of significance and conclude that the
treatments A, B, C and D differ significantly.

2.4 Randomised Block Design(RBD)

If all the treatments are applied at random relatively


homogeneous units within each strata or block and replicated over all the
blocks. The design is a randomised block design.

Advantages of RBD

(i) Accuracy:

This design has been shown to be more efficient or accurate


than C.R.D for most types of experimental work. The elimination of between
S.S. from residual S.S. usually results in a decrease of error mean S.S.

(ii) Flexibility:

30
In R.B.D no restriction are placed on the number of treatments
or the number of replicates. In general, at least two replicates are required to
carry out the test of significance (factorial design is an exception). In addition,
control (check) or some other treatments may be included more than once
without complications in the analysis.

(iii) Ease of Analysis:

Statistical analysis is simple and rapid. More-over the error of


any treatment can be isolated and any number of treatments may be omitted
from the analysis without complicating it.

Disadvantages of RBD

i) RBD may give misleading results if blocks are not


homogeneous.
ii) RBD is not suitable for large number of treatments in that
case the block size will increase and it may not be possible
to keep large blocks homogeneous.
iii) If the data on more than two plots is missing, the
statistical analysis becomes quite tedious and
complicated.

Layout of RBD: -

Let us consider five Treatments A, B, C, D, E each


replicated 4 times we divided the whole experimental area into 4 relatively
homogeneous block and each in to 5 units the treatments allocated at random
to the blocks particular layout may be follows.

BlockI A B C D E
BlockII B C D E A
BlockIII C D E A B
BlockIV D E A B C
Lay out:

31
means total
y11 y12 .... y1r y1. T1.
y 21 y 22 .... y 2 r y 2. T2.
.. ... ... ... ... ..
yi1 yi 2 .... yij y ir yi. Ti.
... ... ... ... .... ....
... ... ... .... .... .....
y t1 yt 2 .... y tr y t . Tt .
means y .1 y.2 .... y.r 
total T.1 T.2 .... T.r  G

Let us assume that yij is the response of the yield of experiment unit from
ith treatment jth block.

2.4.1 Statistical Analysis of RBD

The model is

y ij    t i  b j  eij ; i  1, 2, , t ; j  1, 2,, r 

Where yij is the response or the yield of the experimental unit receiving the ith
treatment in the jth block;

µ is the general mean effect

ti is the effect due to the ith treatment

bj is the effect due to jth block or replicate


i .i .d
eij ~ N (0,  e2 )

t r
Where µ , ti and bj are constants so that  ti  0
i 1
and b
j 1
j 0

If we write  y
i j
ij  G  Grand Total

y
j
ij  Ti = Total for ith treatment

y
i
ij  B j = Total for jth block

32
µ , ti and bjare estimated by the method of least squares

E   eij2    y ij    t i  b j 
2
… (1)
i j i j

Differentiate with respect to µ

E
0


2 ( yij    t i  b j )(1)  0


i j

 2 ( yij    t i  b j )  0
i j

0
 ( y
i j
ij    ti  b j ) 
2
0

 y      t
i j
ij
i j i j
i   b j )  0
i j
i

 y
i j
ij  tr  r  t i  t  b j  0
i j

Where  y
i j
ij G

G  tr  r  t i  t  b j  0 … (2)
i j

Differentiate with respect to ti

E
0
t i

2 ( yij    t i  b j )(1)  0
j

 2 ( yij    t i  b j )  0
j

0
(y
j
ij    ti  b j ) 
2
0

33
 y     t  b )  0
j
ij
j j
i
j
j

Where y j
ij  Ti

Ti  r  rt i   b j …(3)
j

Differentiate with respect to bj

E
0
b j

2 ( yij    t i  b j )(1)  0
i

 2 ( yij    t i  b j )  0
i

0
(y
i
ij    ti  b j ) 
2
0

 y     t  b )  0
i
ij
i i
i
i
j

Where y i
ij  Bj

B j  t   ti  tb j
i ….(4)
t r

 ti  0
i 1
and b
j 1
j 0

From equation (2)

G  tr
G
 ˆ
tr

From Equation (3)

34
Ti  r̂  rt i

Ti  r̂  rt i

Ti G
  tˆi
r tr

From equation (4)

B j  tˆ  tb j
Bj G ˆ
  bj
t tr

Error Sum of Square

E    y ij    t i  b j 
2

i j

E   ( yij    t i  b j )( yij    t i  b j )
i j

E   yij ( yij    t i  b j ) + other terms are vanished


i j

E=  yij2  ˆ  yij   yij tˆi   yij bˆ j


i j i j i j i j

G T G Bj G
  y ij2  G   y ij ( i  )   y ij (  )
i j tr i j r tr i j t tr

Where  yi j
ij  G; y
j
ij  Ti ;  yij  B j
i

  Ti 2    Bj 
2

G  i G   j
2 2
G2 
= ( y ij 
2
)    
i j tr  r tr   t tr 
   

Error Sum of Square (E.S.S) = Total Sum of Square (T.S.S)-Treatment Sum of


Square (Tr. S.S)- Block Sum of Square (B.S.S)

G2
Where, correction factor=
tr

35
G2
Total Sum of Square =  y  2
ij
i j tr

T i
2

G2
Treatment Sum of Square = ST2 = i

r tr

B 2
j
G2
Block Sum of Square = SB2= 
j

t tr

Table 2.2 ANOVA Table for RBD

Source of Degrees of Sum of Mean sum of Variance ratio


variation freedom squares square

Treatment (t-1) ST² ST²=ST²/t-1 FT=ST²/SE²

Blocks or (r-1) SB² SB²=SB²/r-1 FB²=SB²/SE²


replicates

Error (t-1) (r-1) SE² SE²=SE²/(t-


1)(r-1)

Total rt-1

Under the null hypothesis H0t = t1=t2=…=ti against the alternative that all t‟s
are not equal the test statistic is :

S T2
FT  ~ Ft 1,t 1r 1
S E2

i.e., FT follows F(central) distribution with [(t-1), (t-1)(r-1)] d.f. Thus if FT is


greater than tabulated F for [(t-1), (t-1)(r-1)] d.f, at certain level of significance,
usually 5 % then we reject the null hypothesisH0t and conclude that the
treatments differ significantly. If Ft is less than tabulated value then FT is not
significant and we conclude that the data do not provide any evidence against
the null hypothesis which may be accepted.

Similarly under the null hypothesis Hob=b1=b2=…=br, against the alternative


that b‟ s are not equal, the test statistics is:

36
S T2
FT  2 ~ Ft 1,t 1r 1
SE

And we discuss its significance as explained above.

Problem 3.3

Consider the results given in the following table for an experiment


involving six treatments in four randomized blocks. The treatments are
indicated by numbers within parentheses.

Table 2.3

Blocks Yield for a randomized block experiment


treatment and yield

1 24.7 (1) 27.7(3) 20.6(2) 16.2(4) 16.2(5) 24.9(6)

2 22.7(3) 28.8(2) 27.3(1) 15.0(4) 22.5(6) 17.0(5)

3 26.3(6) 19.6(4) 38.5(1) 36.8(3) 39.5(2) 15.4(5)

4 17.7(5) 31.0(2) 28.5(1) 14.1(4) 34.9(3) 22.6(6)

Test whether the treatments differ significantly.

Solution:

Null hypothesis:

Hₒt : τ₁ = τ₂ = τ₃= τ₄ and Hₒb : b₁ = b₂ = b₃ = b₄,i.e.,the treatments as well as


block are homogeneous.

Alternative hypothesis:

H₁t: At least two τᵢ‟s are different. ; H₁b: At least two bᵢ‟s are different.

For finding the various S.S., we rearrange the above table as follows:

37
Table 2.4
Block

Blocks (1) (2) (3) (4) (5) (6) Total Bᴊ²

(Bᴊ)

1 24.7 20.6 27.7 16.2 16.2 24.9 130.0 16,900.00

2 27.3 28.8 22.9 15.0 17.0 22.5 133.3 17,768.89

3 38.5 39.5 36.8 19.6 15.4 26.3 176.1 31,011.21

4 28.5 31.0 34.9 14.1 17.7 22.6 148.8 22,141.44

Treatment 199.0 119.9 122.1 64.9 66.3 96.3 388.5=G


totals (Tᵢ)

Tᵢ²
14,161.00 14,376.01 14,908.41 4,212.01 4,395.69 9,273.69
Average
29.75 30.0 30.5 16.2 16.6 24.1

Correction Factor = (3, 46,332.25/24) = 14,430.51

Raw S.S =  y ij2 = 15,780.76


i j

Total S.S = R.S.S. – C.F. = 15,780.76 – 14,430.51 = 1,350.25

S.S. due to treatments (S.S.T) = ¼ ∑ Tᵢ² - C.F = (61,326.81/4)- 14,430.51 =


901.19

S.S due to blocks (S.S.B) = 1/6 ∑ Bᴊ² - C.F = 87,899.63/6 – 14430.51 =


219.43

Error S.S = T.S.S. – S.S.T. – S.S.B. = 1,350.25 – 901.19 – 219.43 = 229.63.

Table 2.5: Anova Table

Source of S.S. M.S.S. Variance ratio (F)


variation d.f.

Treatment 5 901.19 s²T = 180.24 Ft = 180.24/15.31 = 11.8

38
Block 3 219.43 s²B = 73.14 Fb = 73.14/15.31 = 4.7

Error 15 229.63 s²E = 15.31

Total 23 1.350.25

Tabulated F₃, ₁₅, (0.05) = 5.42 and F₅, ₁₅ (0.05) = 4.5 .Since under Hₒt Ft ̴ F (5,
15) and under Hₒb, Fb ̴ F (3, 15), we see that Ft is significant while Fb is not
significant at 5% level of significance. Hence, Ft is rejected at 5% level of
significance and we conclude that treatment effects are not alike. On the other
hand, Hb may be retained at 5% level of significance and we may conclude that
the blocks are homogeneous.

2.4.2 Estimation of one Missing Value in RBD

Let the observation yij = x (say) in the jth block and receiving the ith treatment be
missing, as given in table 3.7 Table 3.7

Treatments

1 2 … I … t

1 Y11 Y21 … Yi1 … Yt1 y.1

2 Y12 Y22 … Yi2 … Yt2 y.2

   …  …  

J Yj1 Yj2 … X … yjt 


y. j  x
Blocks
   …  …  

R Y1r Y2r … yir … ytr

Total Y1. Y2. … yi. .  x … Yt. y..  x

where


y i . is total of known observations getting ith treatment


y. j is total of known observations in jth block and

39
y.. is total of all known observations

G 2 G   x 
2
Correction factor = 
tr tr

Total sum of square =  y 


G2
 x 2  cons tan t terms independen t of x 
2 G  x 2

ij
i j tr tr


2

 y . i   y i ..  x 

Sum of square due to treatment (S.S.Tr) = i
 C.F  i
 C.F
r r

Sum of square due to

y  y   x 
2 2
.j

 C.F     C.F
.j
Block (S.S.B) =
j

t r

Sum of square due to error = T.S.S –S.S.Tr-S.S.B =


  yi2.   y. j
2
 i 
( yij  C.F )    C.F   (  C.F )
2 j

i j  r  t
 

=
  
2

 
x 2  cons tan t terms independen t of x 
G   x  2   i.

y x
 
G  x
2 
  y. j   x 2 G   x 2 
  
tr  r tr   t tr 
 

 y   x 
2

 x 2  cons tan t terms independen t of x 


G   x  2
  i.
 
G  x
2

 y. j   x 
2

G  x
2

tr r tr t tr

 y   x 
2

   y. j   x   G   x 
2 2
 x 2  cons tan t terms independen t of x  
i.

r t tr

Differentiate with respect to x

40
 S .S .E 
0
x

 2x  2
 yi.  x   2 y. j   x   2 G   x   0  0
r t tr 2

 y   x   y   x 
x
i.
   ij   G   x   0
r t tr

 
trx  t ( yi.  x)  r  y. j  x   G   x 
  0
tr

trx  t  yi.  x   r  y. j   x   G  x   0  tx  0

 
trx  tyi.  tx  ry . j  rx  (G   x)  0

 
x(tr  t  r  1)  tyi.  ry . j  G   0

 
x(tr  t  r  1)  tyi.  ry . j  G 

 
x((t  1)(r  1))  tyi.  ry . j  G 

 
tyi.  ry . j  G 
x
(r  1)(t  1)

Problem 3.3

Suppose that the value for treatment 2 is missing in replication III. The data
will then be as presented in the table below.

Table 2.6 RBD data with one missing value.

Replication

Treatment Total

I II III IV

1 22.9 25.9 39.1 33.9 121.8

41
2 29.5 30.4 X 29.6 89.5

3 28.8 24.4 32.1 28.6 113.9

4 47.0 40.9 42.8 32.1 162.8

5 28.9 20.4 21.1 31.8 102.2

Total 157.1 142.0 135.1 156.0 590.2

X = rR‟ + tT‟ – G‟/ (r-1) (t-1)

= 4(135.1) + 5(89.5) – 590.2/(3)(4)

= 397.7/12

= 33.1

The upward bias,

B = [R‟ – (t-1) X]²/t(t-1)

= [135.1 – 4(33.1)]²/(5)(4)

= 7.29/20

= 0.3645

After substituting the estimated missing value, we get

Treatment 2 total = 89.5 + 33.1 = 122.6,

Replication 3 total = 135.1 + 33.1 = 168.2, and

The grand total = 590.2 + 33.1 = 623.3

Treatment SS = ¼ [(121.8)² + (122.6)² + (113.9)² + (162.8)² + 102.2)²] –


(623.3)²/20

= 19946.9725 – 19425.1445

= 521.8280

Corrected treatment SS = 521.8280 – 0.3645

42
= 521.4635

With these values the analysis of variance table is completed.

Table 2.7 Analysis of variance for the data in Table

Source of variation

df SS MS F

Replication 3 69.1855 23.0618 1

Treatment 4 521.4635 130.3659 4.117

Error 11 347.9475 31.6316

Total 18 938.9610

2.4.3 Estimation of two missing values

Suppose in RBD with k treatments and R-Replications, two


observations are missing. Let x and y be two missing observations and they belong
two different Block and affected different treatment. We assume that x belongs to
the jth to the ith treatments and y belong to ith block and mth treatment. Estimate
the missing observations x and y.

Layout of two missing observations in RBD.

2……. l…………m……………. K
1……

1 y₁₁ y₁₂ B₁

2 y₂₁ y₂₂ B₂

. .

. .

J x B’ᴊ+x

43
. .

. .

I y B’ᵢ+y

. .

. .

r Bᵣ

T₁ T₂ Tᵢ’+X….. T’m+Y Tk G’+x+y

G 2 G   x  y 
2
Correction factor = 
tr tr

Total sum of square =


G2 G  x  y  2

 yij2 
i j tr
 x 2  y 2  cons tan t terms independent of x and y 
tr


2

 y . i   y i ..  x  y 

Sum of square due to treatment (S.S.Tr) = i
 C.F  i
 C.F
r r

Sum of square due to

y  y   x 
2
y 
2
.j

 C.F     C.F
.j
Block (S.S.B) =
j

t r

S.S.E=T.S.S – S.S.Tr-S.S.B

x 2
 y 2  cons tan t terms indepent of x and y  C.F 
    
2
=  Ti   x   Tm   B j  x   Bi  
2 2 2
y   y 
  
 C.F     C.F 
 r   r    t   t 
         

44
  T   x 
2
 T   y  
2

 2  i   m  
 x  y  cons tan t terms indepent of x and y  .C.F  
2

 
r r
 
 B  x 
2

 B  y 
2 
 
C.F       C.F
j i

 t t 

 T   x   T    B   x   B  
2 2 2 2
y  y 
  G   x  y 
2
 x2  y2     m     i
i j
...(1)
r r t t tr

Differentiate with respect to x in equation (1)

.S .S .E
0
x


 2 B j  x 
2x 
2(Ti  x)
    2(G   x  y)  0
r t tr

  B   x 
(T  x)  j
x i    (G   x  y)  0  0
r t tr 2

 
xtr  t  Ti  x   r  B j  x   G   x  y 
    0
tr

 
xtr  t  Ti  x   r ( B j  x)  G   x  y   0  tr  0
 

 
xtr  tTi  tx  rB j  rx  G   x  y  0

 
x(tr  t  r  1)  tTi  rB j  G   y

45
 
tTi  rB j  G   y
x
(t  1)(r  1)

Differentiate with respect to y in equation (1)

.S.S.E
0
y


 2 Bi  y 
2y 
2(Tm  y)
    2(G   x  y)  0
r t tr

  B   y 
(T  y )  i
y m    (G   x  y)  0  0
r t tr 2

 
ytr  t  Tm  y   r  Bi  y   G   x  y 
    0
tr

 
ytr  t  Tm  y   r ( Bi  y )  G   x  y   0  tr  0
 

 
ytr  tTm  ty  rBi  ry  G   x  y  0

 
y(tr  t  r  1)  tTm  rBi  G   x

 
tTm  rBi  G   x
y
(t  1)(r  1)

Problem 3.4

Suppose that one more value is missing in row 5 and column 3.

Table 2.8 Grain yield of paddy, kg/plot

E C D B A Total

46
26 42 39 37 24 168

A D E C B

24 33 21 (X) 38 116

D B A E C

47 45 31 29 31 183

B A C D E

38 24 36 41 34 173

C E B A D

41 24 (X) 26 30 121

Total 68 127 133 157 761

The treatment totals are

A : 129, B : 158, C : 150, D : 190,E : 134

The means for second row and fourth column in which C is missing are
116/4 = 29.0 and 133/4 = 33.25, respectively. Hence the first estimate for C is

C₁ = 29.00 + 33.25/2 = 31.12

G‟ = 761 + 31.12 = 792.12

B₁ = t(R‟ + C‟ + T‟) – 2G‟/(t-1) (t-2)

= 5(121 + 127 + 158) – 2(792.12)/(5-1)(5-2)

= 2030/12 – 1584.24/12

= 169.17 – 132.02

= 37.15

For the second cycle we have

G‟ = 761 + 37.15 = 798.15

47
G₂ = 5(116 + 133 + 150) – 2(798.15)/12

= 1995/12 – 1596.3/12

= 166.25 – 133.03

= 33.22

G‟ = 761 + 33.22

= 794.22

B² = 169.17 – 2(794.3)/12

= 36.8

It can be seen that the estimated values for B are same and that for C are
very close. Hence we stop the iteration process at third cycle. The final
estimates for B and C for the missing plots are 36.8 and 33.3 respectively.

The column total, row total, etc., with respect to the missing plots are
modified by adding the estimated values. Thus we have,

Treatment B total = 158 + 36.8 = 194.8

Treatment C total = 150 + 33.3 = 183.3

Second row total = 116 + 33.3 = 149.3

Fifth row total = 121 + 36.8 = 157.8

Third column total = 127 + 36.8 = 163.8

Fourth column total = 133 + 33.3 = 166.3

Grand total = 761 + 36.8 + 33.3 = 831.1

The data is then analysed in the usual manner.

CF = (831.1)²/25

= 27629.0884

Total SS = 28902.130 – CF =1273.0416

Row SS = 27766.666 – CF = 137.5776

Column SS = 27667.026 – CF = 37.9376

48
Treatment SS = 28448.586 – CF = 819.4976

Error SS = 278.0288

Now ignoring the treatment classification the missing values are estimated
as in the case of RBD. The estimate of the second row, fourth column missing
value is 28.5; and that of fifth row, third column is 28.2. After substituting the
estimated values and analyzing the data as RBD, we get the error sum of
squares as 1031.5856. Then we have,

Corrected treatments SS = Error SS (RBD) – Error (LSD)

= 1031.5856 – 278.0288

= 753.5568

The final results are presented in the following table.

Table 2.9: Analysis of variance for the data

Source of
variation
df SS MS F

Row 4 137.5776 34.3944 1.237

Column 4 37.9376 9.4844 <1

Treatment 4 753.5568 188.3892 6.776

Error 10 278.0288 27.8029

Total 22 1273.0416

2.5 Latin square design (LSD)

LSD is defined for eliminating the variation of two factors called row and
column in this design. The number of treatments is equal to the number of
replications.

Layout of design

In this design the number of treatments is equal to the number of


replications. Thus in case of m treatments there have to be mxm = m²
experimental units. The whole of the experimental area is divided into m²
49
experimental units (plots) arranged in a square so that each row as well each
column contain m units.

The m treatments are allocated at random to these rows and columns in


such a way that every treatment occurs only once in each row and in each
column. Such a layout is LSD.

2x2 layouts

A B

B A

3x3 layouts

A B C

B C A

C A B

4x4 layouts

A B C D

B C D A

C D A B

D A B C

5X5 layouts

A B C D
E

B C D E
A

C D E A
B

50
D E A B
C

E A B C D

Example:

An animal feeding experiment where the column groups may corresponding


with initial weight and the row group with age.

Standard Latin square:

A Latin in which the treatments say A, B, C etc occur in the first row
and first column in alphabetical order is called standard Latin square.

Example:

A B

B A

Advantages of LSD

1. With two way grouping LSD controls more of the variation than
CRD or RBD.
2. The two way elimination of variation as a result of cross grouping
often results in small error mean sum of squares.
3. LSD is an incomplete 3-way layout. Its advantage over the complete
3-way layout is that instead of m³ experimental units only m² units
are needed. Thus, a 4x4 LSD results in saving of m³ = 4³ - 4² = 64-
16 = 48 observations over a complete 3-way layout.
4. The statistical analysis is simple though slightly complicated than
for RBD. Even 1 or 2 missing observations the analysis remains
relatively simple.
5. More than one factor can be investigated simultaneously.
Disadvantages of LSD
1. LSD is suitable for the number of treatments between 5 and 10 and
for more than 10 to 12 treatments the design is seldom used. Since

51
in that case, the square becomes too large and does not remain
homogeneous.
2. In case of missing plots the statistical analysis becomes quite
complex.

3. If one or two blocks in a field are affected by some disease or pest.


We can‟t omit because the number of rows columns and treatments
have to be equal.

2.5.1 Statistical Analysis of LSD

Let yijk (i, j, k=1,2,…,m)denote the response from the unit in the ith row, jth
column and receiving the kth treatment.

The model is

y ijk    ri  c j  t k  eijk ; i, j , k  1, 2, , m

Where µ is the constant mean effect; ri, cj and tk due to the ith row, jth column
and kth treatment respectively and eijk is error effect due to random component
assumed to be normally distributed with mean zero and variance
 e2 i.e., eijk ~ N (0, e2 )

If we write

G= Total of all the m2 observations

Ri = Total of the m observations in the ith row

Cj = Total of the m observations in the jth column

Tk = Total of the m observations from kth treatment

Estimation by the method of least squares

E eijk    ( yijk    ri  c j  t k ) 2 … (1)


2

ijk

E E E E
 0,  0,  0, 0
 ri c j t k

52
Differentiate with respect to µ in equation (1)

E
 2  yijk    ri  c j  t k (1)  0
 ijk

y
ijk
ijk      ri   c j   t k  0
ijk ijk ijk ijk

Where y
ijk
ijk  G , I,j,k=m2, I,j=m, I,k=m

G  m 2   m ri  m c j  m t k  0 … (2)
i j k

Differentiate with respect to ri in equation (1)

E
 2  yijk    ri  c j  t k (1)  0
ri jk

y
jk
ijk      ri   c j   t k  0
jk jk jk jk

Where y
jk
ijk  Ri , I,j,k=m2, I,j=m, I,k=m

Ri  m  mri  m c j  m t k  0 … (3)
j k

Differentiate with respect to cj in equation (1)

E
 2  y ijk    ri  c j  t k (1)  0
c j ik

y
ik
ijk      ri   c j   t k  0
ik ik ik ik

Where y
ik
ijk  C j , I,j,k=m2, I,j=m, I,k=m

C j  m  m ri  mc j  m t k  0 … (4)
i k

Differentiate with respect to tk in equation (1)

E
 2 yijk    ri  c j  t k (1)  0
t k ij

53
y
ij
ijk      ri   c j   t k  0
ij ij ij ij

Where y
ij
ijk  Tk , I,j,k=m2, I,j=m, I,k=m

Tk  m  m ri  m c j  mt k  0 … (5)
i j

The equations (2), (3),(4) and (5) are not independent

We assume that, r
i
i 0, c
j
j  0 and t
k
k 0

From equation (2)

G  m2

G  m2

G
 ̂
m2

From equation (3)

Ri  m  mri  0

Ri  m̂  mri

Ri mG
  rˆi
m m m2

Ri G
 2  rˆi
m m

From equation (4)

C j  m  mc j  0

C j  m̂  mc j

Cj mG
  cˆ j
m m.m 2
54
Cj G
  cˆ j
m m2

From equation (5)

Tk  m  mt k  0

Tk  m̂  mt k

Tk mG ˆ
  tk
m mm 2

Tk G
  tˆk
m m2

Error Sum of Square

E eijk    ( yijk    ri  c j  t k ) 2
2

ijk

=  ( yijk    ri  c j  t k )( yijk    ri  c j  t k )
ijk

  ( yijk    ri  c j  t k )( yijk )  other terms are vanished


ijk

  yijk2  ˆ  yijk   yijk rˆi   yijk cˆ j   yijk tˆk


ijk ijk ijk ijk ijk

G R G  Cj G  T G 
  yijk2  y ijk   yijk  i  2    yijk   2    yijk  k  2 
ijk m2 ijk ijk  m m  ijk  m m  ijk m m 

  Ri2   C j    Tk2 
2

G  i
2
G 
2  G 2 
 k G2 
 ( y ijk2  2 )  2   2   2
j

ijk m  m m   m m   m m 
     

G2
Total Sum of Square =  y 2
ijk  2
ijk m

R i
2

G2
Row Sum of Square = S R2 = i

m m2

55
C 2
j
G2
Column Sum of Square= S C2 = 
j

m m2

T k
2

G2
Treatment Sum of Square= S T2 = k

m m2

Table 2.10: ANOVA Table for LSD

Source of Degrees of Sum of Mean sum of Variance


variation freedom squares square ratio

Rows m-1 S R2 s R2  S R2 (m  1) FR  s R2 s E2

Columns m-1 S C2 sC2  S C2 (m  1) FC  s C2 s E2

Treatments m-1 S T2 sT2  S T2 (m  1) FT  sT2 s E2

Error (m-1)(m-2) S E2 s E2  S E2 (m  1) (m  2)

Total m2-1

Let us set up null hypothesis

For row effects H0r=r1=r2 =…= rm=0

For column effects H0c=c1=c2=…=cm=0

For treatment effects H0t=t1=t2=…=tm=0

Alternative Hypotheses

For row effects, H1r: At least two ri‟s are different

For column effects, H1c: At least two ci‟s are different

For treatment effects, H1t: At least two ti‟s are different

d.f under the null hypotheses Hr, Hb and Ht, respectively.

Let Fα = Fα {(m-1), (m-1)(m-2)} be tabulated value of F for [(m-1),(m-1)(m-2)] d.f.


at the level of significance α . Thus if FR> Fα we reject Hor and if FR ≤ Fα we fail
to reject H0r.

56
Similarly, we can test for H0c and H0t.

Problem 3

An experiment was carried out to determine the effect of claying the ground
on the field of barley grains; amount of clay used were as follows:

A: No clay

B: Clay at 100 per acre

C: Clay at 200 per acre

D: Clay at 300 per acre.

The yields were in plots of 8 meters by 8 meters and are given in table.

I II III IV Row totals


(Rᵢ)

I D B C A 83.1

29.1 18.9 29.4 5.7

II C A D B 66.9

16.4 10.2 21.2 19.1

III A D B C 105.2

5.4 38.8 24.0 37.0

IV B C A D 105.0

24.9 41.7 9.5 28.9

Column 75.8 109.6 84.1 90.7 306.2


Totals (Cj)

Perform the ANOVA and calculate the critical difference for the treatment
mean yields.

57
Solution:
The four treatment totals are:
A: 30.8, B:86.9, C:124.5, D:118.0
Grand total G = 360,2, N = 16.
C.F. = (360.2)²/16 = 8109.0025
Raw S.S. = (29.1)² + (18.9)² +……..+ (9.5)² + (28.9)² = 10,052.08
Total S.S. = 10,052.08 – 8,109.0025 = 1,943.0775
S.S.R. = ¼ [(83.1)² + (66.9)² + (105.0)² + (105.0)²] – 8,109.0025
= 33,473.26/4 – 8,109.0025 = 259.3125
S.S.C. = ¼ [(75.8)² + (109.6)² + (84.1)² + (90.7)²] – 8,109.0025
= 33057.10/4 – 8109.0025 = 155.2725
S.S.T. = ¼ [(30.8)² + (86.9)² + (124.5)² + (118.0)²] – 8,109.0025
= 37924.50/4 – 8109.0025 = 1372.1225
Error S.S. = T.S.S. – S.S.R. – S.S.C. – S.S.T. = 156.3700
ANOVA TABLE FOR L.S.D.

Source of
variation S.S. Variance Ratio
(1) d.f. (3) M.S.S.
(4) = (3) ÷
(2) (2)
Rows 259.5375 86.4375 FR = 86.4375/26.0616 =
Columns 3 155.2725 51.7575 3.32<4.76
1,372.1225 457.3742 Fc = 51.7576/26.0616 = 1.98
Treatments 3 156.3700 26.0616 <4.76
Error FT = 457.3742/26.0616 =
3 17.55 > 4.76

6
Total 1,943.0775
15

Tabulated F₃, ₆ (0.05) = 4.76

Hence we conclude that the variation due to rows and columns is not
significant but the treatments, i.e., different levels of clay, have significant
effect on the yield.

58
2.5.2 One Missing observation in LSD

Let us suppose that in m×m Latin Square, the observation occurring in the ith
row , jth column and receiving the kth treatment is missing. Let us assume that
its value is x, i.e., yijk=x

Ri‟ = Total of the known observations in the ith row.

Cj‟ = Total of the known observations in the jth column.

Tk‟ = Total of the known observations receiving kth treatment.

G = grand total.

  R   x  
G   x 2  i   G   x 
2 
 x 2  cons tan t terms independen t of x   
m2  m m2 
 
  2
   2

  C j  x  2    Tk  x  2 
   
G   x     
G   x  
 m m2   m m2 
   
   

 R   x   C   x 
2 2

 i
  j
 Tk  x 2  G   x 2 
 x  cons tan t terms independen t of x 
2
   2 2 
m m m  m 

Differentiate w. r.to x

2Ri  x  2C j  x  2Tk  x  4G   x 


2x     0
m m m m2

 R   x 
 x
i
  C j  x   Tk  x   2G   x   0  0
m m m m2 2


m Ri  x  mC   x 
mTk  x  2G   x 
  2 
m2 x
  0
j
2 2
m m m m2 m2

m2 x  m | ( Ri  x)  mCj  x  m(Tk  x)  2(G  x)  0

 
m 2 x  mRi  mx  mC j  mx  mTk  2G   2 x  0

59

You might also like