0% found this document useful (0 votes)
180 views82 pages

Statistical Inference

This document discusses statistical inference and point estimation. It defines statistical inference as making inferences about an unknown population based on a sample from that population. It discusses two problems of statistical inference: estimation and hypothesis testing. Point estimation is defined as using a statistic to estimate an unknown parameter, with the goal of the estimate being as close as possible to the true parameter value. Properties of estimators like unbiasedness are discussed. Several examples are provided to illustrate unbiased point estimators for various distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views82 pages

Statistical Inference

This document discusses statistical inference and point estimation. It defines statistical inference as making inferences about an unknown population based on a sample from that population. It discusses two problems of statistical inference: estimation and hypothesis testing. Point estimation is defined as using a statistic to estimate an unknown parameter, with the goal of the estimate being as close as possible to the true parameter value. Properties of estimators like unbiasedness are discussed. Several examples are provided to illustrate unbiased point estimators for various distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

1

STATISTICAL INFERENCE

BY DR RAJIV SAKSENA
DEPARTMENT OF STATISTICS
UNIVERSITY OF LUCKNOW

We know that statistical data is nothing but a random sample of observations drawn from a
population described by a random variable whose probability distribution is unknown or partly
unknown and we try to know about the properties of the population on the basis of knowledge of
the properties of the sample. This inductive process of going from known sample to the unknown
population is called ‘Statistical Inference ‘

Formally, let x be a random variable describing the population under investigation. Suppose X has
þ.m.ƒ 𝑓𝑜 (𝑥) = 𝑃(𝑥 = 𝑥) or þ d ƒ 𝑓𝑜 (𝑥) which depend on some unknown parameter 𝜃 (single or vector
valued) that may have any value in a set Ω (called the parameters space). We assume that the
functional form of 𝑓𝑜 (𝑥) is known but not the parameter 𝜃(except that 𝜃 ∈ Ω). For example, the
family of distributions {𝑓𝜃 (𝑥), 𝜃 ∈ Ω} may be the family of Poisson distribution {𝑃(𝜆), 𝜆 ≥ 0} or
normal distribution {𝑁(𝜇, 𝜎 2 ), −∞ <𝜇 < ∞, 𝜎 ≥ 0}

Two problem of statistical inference are-

1. To estimate the value of 𝜃 − problem of estimation


2. To test a hypothesis about 𝜃 - problem of testing of the hypothesis

POINT ESTIMATION

Definition: A random sample of size ‘n’ from the distribution of X is a set of independent and
identically distributed random variables {𝑥1 , 𝑥2 , … , 𝑥𝑛 } each of which has the same distribution as
that of X. The probability of the sample is given by

𝑓𝑜 (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = 𝑓𝑜 (𝑥1 )𝑓𝑜 (𝑥2 ) … 𝑓𝑜 (𝑥𝑛 )

Definition: A statistic T = T (x1,x2,…, xn) is any function of the sample values, which does not
depend on the unknown parameter 𝜃. Evidently, T is a random variable which has its own
probability distribution (called the ‘ Sampling distribution’ of T)

1 1
For example, 𝑥̅ =𝑛 ∑𝑛𝑖 𝑥𝑖 ; 𝑠 2 = 𝑛−1 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 𝑋(1) = min(𝑥1 , 𝑥2, … 𝑥𝑛 ) , 𝑋(𝑛) = 𝑚𝑎𝑥(𝑥1 , 𝑥2, … 𝑥𝑛 ) are

some statistics.

If we use the statistic T to estimate the unknown parameter 𝜃, it is called the estimator (or point
estimators) of 𝜃 and the value of T obtained from a given sample is its ‘estimate’

Remark: Obviously, for T to be a good estimator of 𝜃 , the difference [𝑇 − 𝜃] should be as small as


possible. However, since T is itself a random variable all that we can hope for is that it is close to 𝜃
with high probability.
2

Theorem : Let (X1, X2,…,Xn) be a random sample of ‘n’ observations on X with mean 𝐸(𝑋 ) = 𝜇 and
1 1
variance 𝑉𝑎𝑟(𝑥) = 𝜎 2 Let the sample mean and sample variance be 𝑥̅ = ∑𝑛𝑖 𝑥𝑖 and 𝑠 2 = ∑(𝑥𝑖 − 𝑥)2
𝑛 𝑛

Then,

̅̅̅=𝜇
(I)E(𝑋)

2
̅̅̅=𝜎
(ii)𝑉(𝑋) 𝑛

𝑛−1
(iii)E(𝑆 2 ) = 𝑛
𝜎2

Prof: We have

𝑛 𝑛
1 1
̅̅̅ = 𝐸 ( ∑ 𝑥𝑖 ) = ∑ 𝐸 (𝑥𝑖 ) = 𝜇
𝐸(𝑋)
𝑛 𝑛
𝑖 𝑖

𝑛 n
1 1 𝜎2
̅̅̅ = 𝑉 ( ∑ 𝑥𝑖 ) = ∑ V (𝑥𝑖 ) =
𝑉(𝑋)
𝑛 n2 𝑛
𝑖 i

𝑛
2)
𝐸(𝑛𝑠 = 𝐸 ∑( 𝑥𝑖 − 𝑥̅ )2
𝑖

= 𝐸 ∑[[(𝑥𝑖 − 𝜇) − (𝑥̅ − 𝜇)]2 ]


1

= 𝐸 [∑(𝑥𝑖 − 𝜇)2 − 𝑛(𝑥̅ − 𝜇)2 ]


𝑖

= 𝐸(𝑥𝑖 − 𝜇)2 − 𝑛𝐸(𝑥̅ − 𝜇)2

= 𝑛𝜎 2 − 𝑛𝜎 2 /𝑛

= (𝑛 − 1)𝜎 2

𝑛−1 2
𝐸(𝑠 2 ) = 𝜎
𝑛

PROPERTIES OF ESTIMATORS

UNBIASEDNESS:

An estimator T of an unknown parameter 𝜃 is called unbiased if

𝐸 (𝑇) = 𝜃 for all 𝜃 ∈ Ω


3

𝐸𝑥𝑎𝑚𝑝𝑙𝑒. If (𝑥1 , 𝑥2, … , 𝑥𝑛 ) is a random sample from any population with mean 𝜇 and variance 𝜎 2,
the sample mean 𝑥̅ is an unbiased estimator of 𝜇 but the sample variance 𝑆 2 is not an unbiased
estimator of 𝜎 2 .

𝑛𝑠 2 1
However, 𝑛−1
= 𝑛−1 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 is an unbiased estimator of 𝜎 2 .

𝐸𝑥. if (𝑥1 , 𝑥2, … 𝑥𝑛 ) is a random sample from 𝑎 normal distribution 𝑁(𝜇, 𝐼) show that 𝑇=
1
∑𝑛𝑖 𝑥𝑖 2 − 1 is an unbiased estimator of 𝜇2 ,
𝑛

1 1
Soln. 𝐸(𝑇) = 𝐸 [ ∑𝑛𝑖 𝑥𝑖 2 − 1] = ∑𝑛𝑖 𝐸(𝑥𝑖 2 ) − 1
𝑛 𝑛

E(xi2)= V(x)+E(xi)= (𝜇2 + 1)

𝑛
1
= ∑(𝜇2 + 1) − 1 = 𝜇2
𝑛
1

𝑬𝒙𝒂𝒎𝒑𝒍𝒆: Let (𝑥1 , 𝑥2, … 𝑥𝑛 ) be a random sample of observation from a Bernoulli distribution
𝑦(𝑦−1)
ƒ𝜃 (𝑥) = 𝜃 𝑥 (1 − 𝜃)1−𝑥 (𝑥 = 0,1) show that 𝑇 = is an unbiased estimator of 𝜃 where 𝑦 = ∑𝑛𝑖 𝑥𝑖
𝑛(𝑛−1)

Soln: We know that 𝐸(𝑥𝑖 ) = 𝜃 and 𝑉(𝑥𝑖 ) = 𝜃(1 − 𝜃) so that 𝐸(𝑌) = 𝑛 𝜃 and 𝑉(𝑌) = 𝑛𝜃(1 − 𝜃)

Now

𝐸(𝑌(𝑌 − 1) = 𝐸(𝑌 2 ) − 𝐸(𝑌)

= 𝑉(𝑌) + [𝐸(𝑌)]2 − 𝐸(𝑌)

= 𝑛𝜃 (1 − 𝜃) + 𝑛2 𝜃 2 − 𝑛𝜃

= 𝑛(𝑛 − 1)𝜃 2

𝑌(𝑌−1)
E(T)=𝐸 [𝑛(𝑛−1)] = 𝜃 2

Showing it to be an unbiased estimator of 𝜃 2

Example: Show that the mean 𝑥̅ of a random sample of size 𝑛 from the exponential distribution
1 𝑥
ƒ𝜃 (𝑥) = 𝜃 𝑒̅ 𝜃 (𝑥 > 0) is an unbiased estimator of 𝜃 and has variance 𝜃 2 /𝑛

Soln: We know that

𝐸 (𝑥𝑖 ) = 𝜃 and 𝑉(𝑥𝑖 ) = 𝜃 2 (𝑖 = 1, . . , 𝑛)

̅̅̅= 𝜃 and 𝑉(𝑋)


𝐸(𝑋) ̅̅̅ = 𝜃 2 /𝑛

Example: Let (𝑥1 , 𝑥2, … 𝑥𝑛 ) to a random sample from a normal distribution with mean 0 and
variance 𝜃 (0< 𝜃 < ∞) so that 𝑇 = ∑ 𝑥𝑖2 /𝑛 is an unbiased estimator of 𝜃 and has variance 2𝜃 2 /n
4

Sohm we know that

𝐸(𝑥𝑖 ) = 0, 𝐸(𝑥𝑖2 ) = 𝑉(𝑥𝑖 ) = 𝜃

𝑛
1
𝐸(𝑇) = ∑ 𝐸(𝑥𝑖2 ) = 𝜃
𝑛
𝑖

Also 𝐸(𝑥𝑖4 ) = 𝜇4 = 3𝜃 2

𝑛
1
𝑉(𝑇) = 𝑉 ( ∑ 𝑥𝑖2 )
𝑛
𝑖

𝑛
1
= 2 ∑ 𝑉(𝑥𝑖2 )
𝑛
𝑖

𝑛
1
= 2 ∑[𝐸(𝑥𝑖4 ) − {𝐸(𝑥𝑖2 )}2 ]
𝑛
𝑖

𝑛
1
= 2 ∑[3𝜃 2 − 𝜃 2 ]
𝑛
𝑖

2𝜃 2
=
𝑛

Example Let (𝑥1 , 𝑥2, … 𝑥𝑛 ) be a random sample from the rectangular distribution 𝑅(0, 𝜃) having

1
þ, 𝑑, 𝑓 𝑓𝜃 (𝑥) = { 𝜃 , 𝑜 ≤ 𝑥 ≤ 𝜃 (𝜃 > 𝑜)
0 ,otherwise

𝑛+1
Show that 𝑇1 = 2𝑥,
̅ 𝑇2 = 𝑛
𝑌𝑛 and 𝑇3 = (𝑛 + 1)𝛾𝑖 are all unbiased for 𝜃 , where 𝑌1 =

min(𝑥1 , 𝑥2, … 𝑥𝑛 ) and 𝑌𝑛 = max(𝑥1 , 𝑥2, … 𝑥𝑛 )

Soln: We know that

𝐸 (𝑥) = 𝜃/2 and 𝑉(𝑥) = 𝜃 2 /12

∑𝑛𝑖 𝑥𝑖 𝜃2
𝐸 (𝑇𝐼 ) = 𝐸 2 ( ) = 𝜃 𝑎𝑛𝑑 𝑉(𝑇𝐼 ) =
𝑛 3𝑛

To obtain the expectation of T2 and T3 we need to obtain their distribution.

The 𝑑. 𝑓. of Yn is-

𝐹𝓎 (𝓎 ) = 𝑃(𝑌𝑛 ≤ 𝓎)

= 𝑃(max(𝑥1 , 𝑥2, … 𝑥𝑛 ) ≤ 𝓎)

= 𝑃(𝑥𝑖 ⪕ 𝓎, 𝑥𝑛 ⪕ 𝓎)
5

= [𝑃(𝑥 ⪕ 𝓎)]𝑛

𝓎 𝓎𝑛
=( )= 𝑛
𝜃 𝜃

𝑛𝓎𝑛−𝑖 ,0⪕𝓎⪕𝜃
þ, 𝑑, 𝑓 𝑜𝑓 𝑌𝑛 is- ℊ𝑌𝑛 (𝓎 ) = { 𝜃𝑛
0 ,𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

𝜃 𝑛𝓎𝑛 𝑛
Hence, 𝐸 (𝑌𝑛 ) = ∫0 𝓎=( )𝜃
𝜃𝑛 𝑛+1

𝑛+1
Or 𝐸( 𝑌𝑛 ) = 𝜃
𝑛

So that T2 is unbiased for 𝜃

𝜃2
[We can check that V (T2) =𝑛(𝑛+2)]

Again, the 𝑑. 𝑓. of Yi is-

𝐹𝑌𝑖 (𝓎 ) = 𝑃{𝑌𝑖 ≤ 𝓎}

= P{min(𝑥1 , 𝑥2, … 𝑥𝑛 ) ≤ 𝓎}

= 𝐼 − 𝑃{𝑥1 > 𝑦, 𝑥2 > 𝑦, … 𝑥𝑛 > 𝑦}

= 𝐼 − [𝐼 − 𝑃(𝑋 < 𝓎)]𝑛

𝓎
= 𝐼 − [𝐼 − ]
𝜃

þ, 𝑑, 𝑓 of 𝑌𝑖 is

𝑛(𝜃 − 𝓎)𝑛−1 , 0 ⪕ 𝓎 ⪕ 𝜃
ℊ𝑌𝑖 (𝓎 ) = {
𝜃𝑛
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

𝜃 𝑛𝓎(𝜃−𝓎)𝑛−1
Hence, 𝐸(𝑌𝑖 ) = ∫𝑂 𝜃𝑛
𝑑𝓎

𝑛 (𝜃 − 𝓎)𝑛 𝜃 1 𝜃
= 𝑛 {−𝓎 ⌊ + ∫ ( 𝜃 − 𝓎)𝑛 𝑑𝓎
𝜃 𝑛 𝑜 𝑛 0

𝑛 −1 (𝜃 − 𝓎)𝑛+1 𝜃
= [[ ]
𝜃𝑛 𝑛 𝑛+1 𝑜

𝜃
=
𝑛+1

So that 𝐸 (𝑇3 ) = 𝐸[(𝑛 + 1)𝑌1 ] = 𝜃


6

𝑛
𝑤𝑒 𝑐𝑎𝑛 𝑐ℎ𝑒𝑐𝑘 𝑡ℎ𝑎𝑡 𝑉(𝑇3 ) = 𝜃2
[ 𝑛+2 ]
𝑠𝑜 𝑡ℎ𝑎𝑡 𝑉(𝑇2 ) < 𝑉(𝑇1 ) < 𝑉(𝑇3 )

Example: Let ((𝑥1 , 𝑥2, … 𝑥𝑛 ) be a random variable from the Rectangular distribution 𝑅(𝜃, 2𝜃) having
þ, 𝑑, 𝑓

1
𝑓(𝑥, 𝜃) = { , 𝜃 ≤ 𝑥 ≤ 2𝜃
𝜃
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

𝑛+1 𝑛+1
Show that 𝑇𝐼 = 2𝑛+1 𝑥(𝑛) , 𝑇2 = 𝑛+2 𝑥(1)

𝑛+1 2
And 𝑇3 = 5𝑛+4 [2𝑥(𝑛) + 𝑥(1) ]𝑎𝑛𝑑 𝑇4 = 3 𝑥̅ are all unbiased

Soln: We can show that the distribution ƪ𝑥(𝑛) 𝑑𝑥(𝑖) have þ, 𝑑, 𝑓 given by

𝑛(𝓎 − 𝜃)𝑛−1
𝑓𝑥(𝑛) (𝓎 ) = = 𝜃 ≤ 𝓎 ≤ 2𝜃
𝜃𝑛
𝑛(2𝜃 − 𝓎)𝑛−1
𝑓𝑥(1) (𝓎 ) = 𝜃 ≤ 𝓎 ≤ 2𝜃
𝜃𝑛

Example: Let 𝑦1 , 𝑦2 , 𝑦3 be the order statistics of a random sample of size 3 from 𝑎 uniform
1 4
distribution having þ, 𝑑, 𝑓 𝑓 (𝑥, 𝜃) = (0 ≤ 𝑥 ≤ 𝜃) show that 4𝑦1 , 2𝑦2 , 𝑦3 are all unbiased estimator
𝜃 3

of 𝜃. Also obtain their variance.

Soln: We can show that 𝑌1 , 𝑌2 , 𝑌3 have þ, 𝑑, 𝑓

3(𝜃 − 𝓎 )
𝑓𝑦1 (𝓎 ) = =𝑜≤𝓎≤𝜃
𝜃3
6𝓎 (𝜃 − 𝓎 )
𝑓𝑦2 (𝓎) = =𝑜≤𝓎≤𝜃
𝜃3
3𝓎2
𝑓𝑦3 (𝓎 ) = 3 = 𝑜 ≤ 𝓎 ≤ 𝜃
𝜃

𝐸 (𝑦1 ) = 𝜃/4, 𝐸(𝑦2 ) = 𝜃/2 ,𝐸(𝑋3 ) = 3⁄4𝜃

𝑉(𝑦1 ) = 3𝜃 2 /80, 𝑉(𝑦2 ) = 𝜃 2 /20, 𝑉 (𝑦3 ) = 3𝜃 3 /80

*If 𝑦1 , 𝑦2 , … , 𝑦𝑛 are two unbiased estimator with variance 𝜎12 , 𝜎22 and correlation coeff. P between
than the linear combination which is unbiased and has minimum variance is.

(𝜎22 − 𝑃𝜎1 𝜎2 )𝑌𝐼 + (𝜎12 − 𝜑𝜎1 𝜎2 )𝑌2


𝑌=
𝜎12 + 𝜎22 − 2𝜑𝜎1 𝜎2

*If 𝑦1 , 𝑦2 , … , 𝑦𝑛 are ind ept unbiased estimators if 𝜃 with variance 𝜎𝑖2 (𝑖 = 1,2. . 𝑛), the linear
combination with minimum variance is

𝑌 = 𝓀1 𝑦1 + 𝓀2 𝓀2 + +𝓀𝑛 𝓀𝑛

Where
7

1
𝓀𝑖 = / ∑𝑛𝑖(𝑖 /𝜎𝑖2 )
𝜎𝑖2

𝐼 𝐼 𝐼
𝑦1 + 2𝑦2 +⋯+ 2 𝑦𝑛
𝜎21 𝜎2 𝜎𝑛
𝑖. 𝑒 𝑦= 𝐼 𝐼 𝐼
+ 2+⋯+ 2
𝜎21 𝜎2 𝜎𝑛

Example Let ‘T’ be an unbiased estimator of 𝜃. Does it imply that 𝑇 2 and√𝑇 , are unbiased for
𝜃 2 𝑎𝑛𝑑 √𝜃) respectively?

Soln : V(T) = E(T 2 ) − [E(T)]2

If 𝐸(𝑇 2 ) = 𝜃 2 , then 𝑉(𝑇) = 𝑂 so that 𝑃 (𝑇 = 𝜃) = 1 which is impossible since T has to be of


independent of 𝜃.

Also, 𝑉(√𝑇) = 𝐸(𝑇) − (𝐸√𝑇)2

If 𝐸(√𝑇) = √𝜃 , then 𝑉(√𝑇) = 𝑜 so that 𝑃(√𝑇) = √𝜃) = 1 = 𝑃(𝑇 = 𝜃) which is impossible.

Example let 𝑦1 , 𝑦2 , be independent unbiased estimator of 𝜃, having finite variance (𝜎12 , 𝜎22 , 𝑠𝑎𝑦).
Obtain a linear combination of 𝑦1 , 𝑦2 which is unbiased and has the smallest variance.

Sohn Let 𝑌 = 𝓀𝑦1 + 𝓀′𝑦2

Evidently, 𝓀 + 𝓀′ = 1 or 𝓀′ = 1 − 𝓀

Then 𝑉(𝑌) = 𝑉[𝓀𝑦𝑖 + (1 − 𝓀)𝑦2 ]

= 𝓀2 𝜎12 + (𝐼 − 𝓀)2 𝜎22

Minimising 𝑉(𝑌) 𝑤. 𝑟. 𝑡. 𝓀, we get

Or 2𝓀𝜎12 − 2(1 − 𝓀)𝜎22 = 0

𝓀 = 𝜎22 /(𝜎12 + 𝜎22 )

The linear combination with minimum variance is

1 1
𝑌 + 𝑌
𝜎22 𝜎12 𝜎12 1 𝜎22 2
𝑌=( 2 ) 𝑦1 + 2 𝑦2 =
𝜎1 + 𝜎22 (𝜎1 + 𝜎22 ) 1 1
+
𝜎12 𝜎22

Note : if 𝜎12 = 2𝜎22 then 𝓀=1/3

Remarks: (𝑖) An unbiased estimator may not exist. Let x be a random variable with Bernoulli
distribution.

𝑓𝜃 (𝑥) = 𝜃 𝑥 (1 − 𝜃)1−𝑥 , 𝑥 = 0,1


8

It can be shown that no unbiased estimator exists for 𝜃 2 .

(𝑖𝑖) Unbiased estimator may be assured.

Let X be a random variable having Poisson distribution 𝑃(𝑥) and suppose we want estimator 𝓰(𝜆)
=ℯ 3𝜆 . Consider a sample of one observation and the estimator T= . Then E(T)= ℯ −3𝜆 so that T is an
unbiased estimator of ℯ −3𝜆 but T(x)= (-2) X for x even and T(x) < 0 for 𝑥 odd, which is absurd since
ℯ −3𝜆 is always positive.

(𝑖𝑖𝑖 ) Instead of the parameter 𝜃 we may be interested in estimating a function 𝓰(𝜃). 𝓰(𝜃) is said to
be ‘estimable’ if there exists an estimator T Such that E(T)= 𝓰(𝜃), 𝜃 ∈ 𝛺.

Minimum Variance Unbiased (MVU) estimators : The class of unbiased estimators may, in
general, be quite large and we would like to choose the best estimator from this class. Among
two estimators of 𝜃 which are both unbiased , we would choose the one with smaller variance.
The reason for doing this rests on the interpretation of variance as a measure of concentration
about the mean. Thus, if T is unbiased for 𝜃, then by Chebyshev’s inequality-

𝑉𝑎𝑟(𝑇)
𝑃{[𝑇 − 𝜃] ≤ 𝜀} > 1 −
𝜀2

Therefore, the smaller 𝑉𝑎𝑟(𝑇) is, the larger the lower bound of the probability of concentration of T
about 𝜃 becomes. Consequently, within the restricted class of unbiased estimators we would choose
the estimator with the smallest variance.

Definition: An estimator T =T (X1,…, Xn) is said to be a uniformly minimum variance unbiased

(UMVU) estimator of 𝜃 (or an estimator for 𝓰(𝜃) if it is unbiased and has the smallest variance
within the class of unbiased estimators of 𝜃 (or 𝓰(𝜃),) of all 𝜃 ∈ 𝛺. That is if T is any other
unbiased estimator of 𝜃, then-

𝑉𝑎𝑟(𝑇) ≤ 𝑉𝑎𝑟(𝑇′)𝑓𝑜𝑟 all 𝜃 ∈ 𝛺

Suppose we decide to restrict ourselves to the class of all unbiased estimators with finite variance.
The problem arises as to how we find an UMVU estimator, if such an estimator exists. For this we
would first determine a lower bound for the variances of all estimators (in the class of unbiased
estimators under consideration) and then would try to determine an unbiased estimator whose
variance is equal to this lower bound. The lower bound for the variances will be given by the
Cramer-Rao inequality for which we assume the following regularity conditions:

Let X be a random variable with þ. 𝑑. 𝑓 𝑓(𝑥; 𝜃) 𝜃 ∈ 𝛺

(𝑖) 𝛺 is an open interval (finite or not )

(𝑖𝑖 )𝑓(𝑥; 𝜃) is positive on a set S independent of 𝜃.

𝜕
(𝑖𝑖𝑖 ) 𝑓(𝑥; 𝜃) exists for all 𝜃 ∈ 𝛺
𝜕𝜃
9

∞ ∞
(𝑖𝑣) ∫−∞ ∫−∞ 𝑓(𝑥1 , 𝜃)𝑓(𝑥2 , 𝜃) … 𝑓 (𝑥𝑛 , 𝜃)𝑑𝑥1 , 𝑥2,….. 𝑑𝑥𝑛

May be differentiated under the integral sign.

∞ ∞
(𝑣) ∫ ∫ 𝑇 (𝑥1 , 𝑥2, … 𝑥𝑛 )𝑓(𝑥1 ; 𝜃) … 𝑓(𝑥𝑛 ; 𝜃)𝑑𝑥1 , 𝑥2,….. 𝑑𝑥𝑛
−∞ −∞

May be differentiated under the integral sign where T(X1, Xn) is any unbiased estimator of 𝜃

Cramer-Rao inequality: Let (X1,…, Xn) be a random sample of n observations on X with


þ. 𝑑. 𝑓 𝑓(𝑥; 𝜃) and suppose the above regularity conditions hold. If T is any unbiased estimator of 𝜃,
then-

1
𝑉𝑎𝑟(𝑇) ≤ 2
𝜕
𝑛𝐸 [ 𝑙𝑜𝑔 𝑓(𝑥; 𝜃)]
𝜕𝜃

Proof: We have


∫ 𝑓(𝑥𝑖 ; 𝜃) 𝑑𝑥𝑖 = 1 ; 𝑖 = 1,2. . 𝑛
−∞

Which gives, on differentiating.𝑤. 𝑟. 𝑡 𝜃


𝜕
∫ 𝑓(𝑥𝑖 , 𝜃) 𝑑𝑥𝑖 = 0
−∞ 𝜕𝜃

∞ 𝜕
Or ∫−∞ [𝜕𝜃 𝑙𝑜𝑔 𝑓 (𝑥𝑖 ; 𝜃)] 𝑓(𝑥𝑖 ; 𝜃)𝑑𝑥𝑖 = 0 … … . . (𝐴)

𝜕
Or 𝐸 [𝜕𝜃 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] = 0 … … . . (1)

Also, since T is unbiased estimator of 𝜃, 𝑤𝑒 ℎ𝑎𝑣𝑒

∞ ∞

𝐸(𝑇) = ∫ ∫ 𝑇 (𝑥1 , . 𝑥𝑛 )𝑓 (𝑥𝑖 , 𝜃). . 𝑓 (𝑥𝑛 , 𝜃)𝑑𝑥𝑖 … 𝑑𝑥𝑛 = 𝜃


−∞ −∞

Which given on differentiation 𝑤. 𝑟. 𝑡. 𝜃

∞ ∞ 𝑛
𝜕
𝐸 (𝑇) = ∫ ∫ 𝑇(𝑥1 , . 𝑥𝑛 ) [∏ 𝑓(𝑥𝑖 , 𝜃)] 𝑑𝑥𝑖 … 𝑑𝑥𝑛 = 1 … … (2)
𝜕𝜃
−∞− ∞ 𝑖=1

But

𝑛 𝑛
𝜕 𝜕
∏ 𝑓(𝑥𝑖 ; 𝜃) = ∑ [ 𝑓 (𝑥𝑖 ; 𝜃) ∏ 𝑓 (𝑥𝑖 ; 𝜃)]
𝜕𝜃 𝜕𝜃
𝑖=1 𝑖=𝑖 𝑖=𝑖

𝑛 𝑛
1 𝜕
= ∑[ 𝑓 (𝑥𝑖 ; 𝜃) ∏ 𝑓 (𝑥𝑖 ; 𝜃)]
( )
𝑓 𝑥𝑖 ; 𝜃 𝜕𝜃
𝑖=𝑖 𝐼=𝑖
10

𝑛 𝑛
𝜕
= [∑ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] ∏ 𝑓(𝑥𝑖 ; 𝜃)
𝜕𝜃
𝑖=1 𝐼=𝑖

So that (2) becomes

∞ ∞ 𝑛
𝜕
∫ ∫ 𝑇(𝑥1 , … 𝑥𝑛 ) [∑ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] 𝑓 (𝑥1 , 𝜃) … 𝑓(𝑥𝑛 , 𝜃)𝑑𝑥𝑖 … 𝑑𝑥𝑛 = 1
𝜕𝜃
−∞ −∞ 𝑖=1

Or E (TZ) =I ............... (3)

Where

𝑛
𝜕
𝑍=∑ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)
𝜕𝜃
𝑖=1

From (1) we immediately get

𝑛
𝜕
𝐸 (𝑍) = ∑ 𝐸 [ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] = 0 … … . . (4)
𝜕𝜃
𝑖=1

And
𝑛 2
𝜕
𝑉𝑎𝑟(𝑧) = ∑ 𝐸 [ 𝑙𝑜𝑔𝑓(𝑥1; 𝜃)]
𝜕𝜃
𝑖=1

2
𝜕
= 𝑛𝐸 [ log 𝑓(𝑥1 ; 𝜃)] … . . (5)
𝜕𝜃

Now, 𝐶𝑜𝑣(𝑇𝑍) = 𝐸 (𝑇𝑍) − 𝐸(𝑇)𝐸(𝑍)

=1

1
(i)An unbiased estimator T whose variance equals the lower bound 𝜕 2
𝑛𝐸[ log 𝑓(𝑥,𝜃)]
𝜕𝜃

𝜕
If and only if T is if the from 𝑇 = 𝜃 + 𝑏𝜃𝑧 where 𝑧 = ∑𝑛𝑖=1 𝑙𝑜𝑔𝑓(𝑥, 𝜃)
𝜕𝜃

Proof:

1
𝑉(𝑇) = 2
𝜕
𝑛𝐸 [𝜕𝜃 log 𝑓 (𝑥, 𝜃)]

I𝓯𝓯
𝑅(𝑇, 𝑍) = 1

𝑖. 𝑒 , 𝑖𝒻 T is a linear 𝒻unction of Z, say

𝑇 = 𝑎𝜃 + 𝑏𝜃 𝑧
11

But 𝐸(𝑇) = 𝑎𝜃 = 𝜃

𝑖. 𝑒 T= 𝜃 + 𝑏𝜃 𝑧

Let (𝑥1 , . . 𝑥𝑛 ) be a random sample from R (0, 𝜃)

1
𝑓 (𝑥, 𝜃) = ,0 ≤ 𝑥 ≤ 𝜃
𝜃

𝜕 1
𝑙𝑜𝑔𝑓(𝑥, 𝜃) =
𝜕𝜃 𝜃

2
𝜕 1
𝐸 [ log 𝑓 (𝑥, 𝜃)] = 2
𝜕𝜃 𝜃

𝜃2
CRB=
𝑛

𝑛+1
We know that 𝑇 = 𝑋(𝑛) is UMVUE whose variance is-
𝑛

𝜃2 𝜃2
𝑉 (𝑇) = <
𝑛(𝑛 + 2) 𝑛

𝐶𝑜𝑣(𝑇,𝑍) 1
Therefore, we have 𝑃 (𝑇, 𝑍) = =
𝑉(𝑇)𝑉(𝑍) 𝑉(𝑇)𝑉(𝑍)

Since 𝑃(𝑇, 𝑍) ≤ 1 we get

1
𝑉(𝑇) ≥ 2
𝜕
𝑛𝐸 [𝜕𝜃 log 𝑓 (𝑥. 𝜃)]

Remark: (𝑖) the left page

(𝑖𝑖) If 𝓰(𝜃) is an estimable function for which an unbiased estimator is T (𝑖. 𝑒. 𝐸 (𝑇) = ℊ(𝜃))
then C.R Inequality becomes-

[ℊ(𝜃)]2
𝑉(𝑇) ≥ 2
𝜕
𝑛𝐸 [ log 𝑓 (𝑥. 𝜃)]
𝜕𝜃

(𝑖𝑖𝑖) It can be show that

𝜕 2 𝜕2
𝐸 [𝜕𝜃 log 𝑓(𝑥; 𝜃)] = −𝐸 [𝜕𝜃2 log 𝑓(𝑥; 𝜃)]

(𝑖𝑣) If an unbiased estimator exists which is such that its variance is equal to the lower bound
1
CRB= 𝜕 2 then it will be UMVUE.
𝑛𝐸[ log 𝑓(𝑥.𝜃)]
𝜕𝜃

(𝑣) If there is no unbiased estimator whose variance equals the C R B it does not mean that
UMVUE will not exist. Such estimators can be found (if these exists ) by other methods.
12

(𝑣𝑖) In case of distributions not satisfying the regularity conditions (e.g.: Rectangular distribution)
UMVU estimators, if these exists can be found by other methods. For such cases UMVU estimator
may have variance less than CRB.

Example: Let (𝑥1 , . . . 𝑥𝑛 ) be a random sample from a Bernoulli distribution 𝑓 (𝑥; 𝜃) = 𝜃 𝑥 (1 −


𝜃)1−𝑥 (𝑥 = 0,1), 0 < 𝜃 < 1

1
Show that 𝑥̅ = 𝑛 ∑𝑛𝑖 𝑥𝑖 is a UMVU of 𝜃

Sohn : log 𝑓 (𝑥; 𝜃) = 𝑥𝑙𝑜𝑔𝜃 + (1 − 𝑥) log(1 − 𝜃)

𝜕 𝑥 1−𝑥
𝑙𝑜𝑔𝑓(𝑥, 𝜃) = −
𝜕𝜃 𝜃 1−𝜃

𝑥−𝜃
=
𝜃(1 − 𝜃)

So that

2
𝜕 𝐸(𝑥 − 𝜃)2
𝐸 [ log 𝑓 (𝑥. 𝜃)] = 2
𝜕𝜃 𝜃 (1 − 𝜃)2

𝜃(1−𝜃)
=
𝜃 2 (1−𝜃)2

1
=
𝜃(1−𝜃)

𝜃(1−𝜃)
By CR inequality we have C R B =
𝑛

𝜃(1−𝜃)
Now, 𝐸(𝑥̅ ) = 𝜃 and 𝑉𝑎𝑟 (𝑥̅ ) = 𝑛
that is equal to C R B. Hence 𝑥̅ is UMVUE of 𝜃

Example: Let x be a random sample having Binomial distribution

𝑚
𝑓 (𝑥, 𝜃) = ( ) 𝜃 𝑥 (1 − 𝜃)𝑚−𝑥 ; 𝑥 = 0,1, … , 𝑚(0 < 𝜃 < 1)
𝑥

Show that 𝑥̅⁄𝑚 is UMVUE of 𝜃.

𝑚
Soln: 𝑙𝑜𝑔𝑓(𝑥, 𝜃) = 𝑙𝑜𝑔 ( ) + 𝑥𝑙𝑜𝑔𝜃 + (𝑚 − 𝑥) log(1 − 𝜃)
𝑥

𝜕 𝑥 𝑚−𝑥
𝑙𝑜𝑔𝑓(𝑥, 𝜃) = +
𝜕𝜃 𝜃 1−𝜃

𝑥 − 𝑚𝜃
=
𝜃(1 − 𝜃)

𝜕 2 𝐸(𝑥−𝑚𝜃)2
So that 𝐸[ 𝑙𝑜𝑔𝑓(𝑥, 𝜃)] =
𝜕𝜃 𝜃 2 (1−𝜃)2

𝑚𝜃(1−𝜃)
=
𝜃 2 (1−𝜃)2

𝑚
= 𝜃(1−𝜃)
13

𝜃(1−𝜃)
For sample of one observation X let T=T(X) be an unbiased estimator. The C.R.B is 𝑚𝑛
. Now
𝑋̅ 𝑥 𝜃(1−𝜃) 𝑋̅
𝐸 (𝑚) = 𝜃 and 𝑉𝑎𝑟 (𝑚) = 𝑚𝑛
so that 𝑚 is UMVUE of 𝜃 (see left page)

Example: Let (x1,..., xn) be a random sample from a Poisson distribution

𝑒 −𝜃 𝜃 𝑥
𝑓 (𝑥, 𝜃) = ; 𝑥 = 𝑜, 1 … … (𝜃 > 𝑂)
𝑥

Show that 𝑥̅ is UMVUE of 𝜃.

Soln: 𝑙𝑜𝑔𝑓(𝑥, 𝜃) = −𝜃 + 𝑥𝑙𝑜𝑔𝜃 − 𝑙𝑜𝑔𝑥;

𝜕 𝑥
𝑙𝑜𝑔𝑓(𝑥, 𝜃) = −𝐼 +
𝜕𝜃 𝜃

𝑥−𝜃
=
𝜃
2
𝜕 𝐸(𝑥, 𝜃)2
𝐸 [ 𝑙𝑜𝑔𝑓(𝑥, 𝜃)] =
𝜕𝜃 𝜃2

1
=
𝜃

The C.R.B = 𝜃⁄𝑛

𝜃
Now 𝐸 (𝑥̅ ) = 𝜃 and 𝑉𝑎𝑟(𝑥̅ ) = 𝑛 so that 𝑥̅ is UMVUE of 𝜃

Example: Let (𝑥1 , . . , 𝑥𝑛 ) be a random sample from a normal distribution 𝑁( 𝜃 , 𝜎 2 ) where


variance 𝜎 is known show that 𝑥̅ is UMVUE of 𝜃.

1 (𝑥−𝜃)2
Soln: 𝑓(𝑥, 𝜃) = 𝑒− 2𝜎 2
√2𝑥𝜎

1 (𝑥 − 𝜃)2
log 𝑓(𝑥, 𝜃) = 𝑙𝑜𝑔 ( )−
√2𝑥𝜎 2𝜎 2

𝜕 (𝑥−𝜃)
Or 𝑙𝑜𝑔𝑓(𝑥, 𝜃) =
𝜕𝜃 𝜎2

2
𝜕 𝐸(𝑥 − 𝜃)2
𝐸 [ 𝑙𝑜𝑔𝑓(𝑥, 𝜃)] =
𝜕𝜃 𝜎4

1
=
𝜎2
2
The C.R.B= 𝜎 ⁄𝑛

2
Now 𝐸 (𝑥̅ ) = 𝜃 and 𝑉(𝑥̅ ) = 𝜎 ⁄𝑛 so that 𝑥̅ is UMVUE of 𝜃

Example Let 𝑥1 , . . , 𝑥𝑛 be a random sample from a normal distribution 𝑁(𝜇, 𝜃) where 𝜇 is known
and 𝜃 is that variance to be estimated. Show that 𝑠 2 = ∑𝑛𝑖(𝑥𝑖 − 𝜇)2 /𝑛 is UMVUE of 𝜃

(𝑥−𝜇)2
1
Soln: 𝑓 (𝑥; 𝜃) = 𝑒− 2𝜃
√2𝜋𝜃

1 1 (𝑥−𝜇)2
𝑙𝑜𝑔 𝑓 (𝑥; 𝜃) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔𝜃 −
√2𝑥 2 2𝜃

𝝏 1 (𝑥−𝜇)2
Or 𝝏𝜽
log f(x, 𝜃) = − 2𝜃 + 2𝜃 2
14

(𝑥 − 𝜇)2 − 𝜃
=
2𝜃 2
2
∂ 𝐸[(𝑥 − 𝜇)2 − 𝜃]2
𝐸 [ log 𝑓 (𝑥, 𝜃)] =
∂θ 4𝜃 4

𝐸(𝑥 − 𝜇)4 − 2𝜃𝐸(𝑥 − 𝜇)2 + 𝜃 2


=
4𝜃 4

3𝜃 2 − 2𝜃 2 + 𝜃 2
=
4𝜃 4

1
=
2𝜃 2
2
The C.R.B= 2𝜃 ⁄𝑛

∑𝑛
𝑖 (𝑥𝑖 −𝜇)
2 2𝜃 2
Consider the estimator 𝑆 2 = for which E(S2)= 𝜃 and V(S2)= so that S2 is UMVUE of 𝜃
𝑛 𝑛

Example An UMVU estimator is unique , in the sense that if TO and TI are both UMVU estimator
then TO = TI almost surely (𝑖. 𝑒 𝑃 (𝑇𝑂 ≠ 𝑇𝐼 ) = 0)

Soln: Since both 𝑇𝑂 and 𝑇𝐼 are unbiased

𝐸 (𝑇𝑂 ) = 𝐸(𝑇𝐼 ) = 𝜃 for all 𝜃 𝜖 𝛺

And since both are UMVUE,

𝑉(𝑇𝑂 ) = 𝑉(𝑇𝐼 )for all 𝜃 𝜖 𝛺

Consider the new estimator

1
𝑇 = (𝑇𝑂 + 𝑇𝐼 )
2

Which is also unbiased. Moreover,

1
𝑉(𝑇) = [𝑉(𝑇𝑂 ) + 𝑉(𝑇𝐼 ) + 2𝜌√𝑉(𝑇𝑂 )𝑉(𝑇𝐼 )]
4

Where 𝜌 is the corr. Coefficient between 𝑇𝑂 𝑎𝑛𝑑 𝑇𝐼

𝐼+𝜌
𝑉(𝑇) = 𝑉(𝑇𝑂 )
2

By definition, 𝑉(𝑇) ≥ 𝑉(𝑇𝑂 ). It follows that 𝜌 ≥ 𝐼. Therefore 𝜌 =I so that, for every 𝜃, 𝑇𝑂 and 𝑇𝐼 are
linearly related, 𝑖. 𝑒.

𝑇𝑂 = 𝑎 + 𝑏𝑇𝐼

Where 𝑎, 𝑏 are amstants (may depend on 𝜃) and b≥ 0 . 𝑇aking expectation and variance we get

𝜃 = 𝑎 + 𝑏𝜃
}
𝑉(𝑇0 ) = 𝑏2 𝑉(𝑇𝐼 )

Which imply that b=1 and 𝑎 = 0. Therefore

𝑇0 = 𝑇

CONSISTENCY
Definition: A sequence of estimator {𝑇𝑛 }. 𝑛 = 1,2, … of a parameter 𝜃 is said to be consistent if, as
n→∞
15

𝑇𝑛 → 𝑝 𝜃 for each fixed 𝜃 𝜖 𝛺 that is , for any 𝜖(> 0)

𝑇𝑛 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑠 𝑡𝑜 𝜃 𝑖𝑛 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦

Or 𝑃{|𝑇𝑛 − 𝜃| > 𝜖} → 0

Or 𝑃{|𝑇𝑛 − 𝜃| ≤ 𝜖} → 1

𝑎𝑠 𝑛 → ∞

Remarks:

(𝒊) For increase in sample size a consistent estimator will become more and more close to 𝜃

(𝑖𝑖)Consistency is essentially a large sample property. We speak of the consistency of a sequence of


estimators rather than that of one estimator.
(iii) If {𝑇𝑛 } is a sequence of estimator which is consistent for 𝜃 and {𝐶𝑛 }, {𝑔𝑛 } are sequence of
constants such that 𝐶𝑛 → 0 𝑔 → 1 as 𝑛 → ∞ then {𝑇𝑛 + 𝐶𝑛 } 𝑎𝑛𝑑 {ℊ𝑛 𝑇𝑛 } are sequences of consistent
estimators also.

(iv) We will show later that if {𝑇𝑛 } is a sequence of estimators such that 𝐸(𝑇𝑛 ) → 𝜃 and 𝑉(𝑇𝑛 ) →
0 and 𝑛 → ∞ then {𝑇𝑛 } is consistent.

Examples:

1. Let (𝑥1 , … 𝑥𝑛 ) be a random sample from any distribution with finite mean 𝜃. Then it follows
from LLN that 𝑥̅ so that ̅̅̅̅̅
𝑥 → is consistent for 𝜃. If the distribution has finite variance
2
(𝜎 2 , 𝑠𝑎𝑦) 𝑉(𝑥̅ ) = 𝜎 ⁄𝑛 → 0 so that it follows from Remark (IV) that 𝑥̅ is consistent .it can be
shown that the sample median is also consistent for 𝜃
2. Suppose (𝑥1 , … , 𝑥𝑛 ) is a random sample from 𝑁(𝜇, 𝜎 2 ).
Let

𝒏
𝒙
̅ = ∑ 𝒊⁄𝒏
𝒙
𝟏

n
1
s = ∑(xi − x̅)2
2
n
1

n
2
1 n 2
s = ∑(xi − x̅)2 = s
(n − 1) n−1
1

4 The following is an example of an estimator which is unbiased but not consistent

Let (𝑥1 , … 𝑥𝑛 ) be a random sample from rectangular distribution. 𝑅 (𝑂, 𝜃) and let 𝑌𝑖 =
𝑚𝑖𝑛(𝑥1 , … 𝑥𝑛 ) consider the estimator 𝑇 = (𝑛 + 1)𝑌1 . This is unbiased . Now for a any 𝐸 (> 0),

𝜃 𝜀
𝑝 {[𝑌1 − ]≤ }
𝑛+1 𝑛+1
𝑛
=
𝜃𝑛
16

𝜃 𝜖

𝑛+1 𝑛+1

𝜃 𝜖
1 +
= 𝑛 [−𝜃 − 𝓎)𝑛 ]𝑛 + 1 𝑛 + 1
𝜃 𝜃 𝜖

𝑛+1 𝑛+1

1 (𝑛𝜃 + 𝜖)𝑛 − (𝑛𝜃 + 𝜖)𝑛


= [ ]
𝜃𝑛 (𝑛 + 1)𝑛

𝑛𝑛 𝜖 𝑛 𝑒 𝑛
= [(1 + ) − (1 − ) ]
(𝑛 + 1)𝑛 𝑛𝜃 𝑛𝜃
𝜖 𝜖
(𝑒 𝜃 − 𝑒 −𝜃 )

𝑛→∞

Which is some fixed number

𝑃{[𝑇 − 𝜃]𝜖} + 1

Thus, T is not constant

We can show that

𝑛−1 2 2𝜎 4 (𝑛 − 1)
𝐸 (𝑠 2 ) = 𝜎 , 𝑉(𝑠 2 ) =
𝑛 𝑛2

2 2 2𝜎 4
𝐸(𝑠′ ) = 𝜎 2 , 𝑉(𝑠′ ) =
𝑛−1
2 2
By remark (iv) above 𝑠 2 + 𝑠′ are both constant for 𝜎 2 , 𝑠 2 is biased and 𝑠′ is unbiased.

3. Let (𝑥1 , … . . 𝑥𝑛 ) be for a random sample for gamma distribution


1 𝑥 þ−1
f(x, 𝜃) = þ 𝑒 𝜃𝑥 (𝑥 ≥ 𝜃, 𝜃 > 0) þ 𝑘𝑛𝑜𝑤𝑛
𝜃 Γ(þ)

Show that 𝑋̅⁄þ is unbiased and consistent for 𝜃

𝜃 2
Soln: 𝐸(𝑋̅⁄þ) = 𝜃, 𝑉(𝑋̅⁄þ) = 𝑛þ → 0

𝑋̅⁄þ is unbiased and consistent

Theorem: If {𝑇𝑛 } is a sequence of estimators (of 𝜃)such that

𝐸 (𝑇𝑛 ) = 𝜃𝑛 → 𝜃

And 𝑉(𝑇𝑛 ) → 0

As n→∞ then {𝑇𝑛 } is consistent estimator of 𝜃.

Proof: By Chebyshev’s inequality, for any 𝜖 (> 0)we have

𝐸(𝑇𝑛 − 𝜃)2
𝑃{|𝑇𝑛 − 𝜃| > 𝜖} ≤
𝜖2

1
= 𝐸[(𝑇𝑛 − 𝜃𝑛 ) + (𝜃𝑛 − 𝜃)]2
𝜖2

1
= 𝐸[(𝑇𝑛 − 𝜃𝑛 )2 + (𝜃𝑛 − 𝜃)2 + 2(𝜃𝑛 − 𝜃)(𝑇𝑛 − 𝜃)]
𝜖2
17

1
= 𝑒 2 [𝑉(𝑇𝑛 ) + (𝜃𝑛 − 𝜃)2 ] →0

As n→∞ by given condition of the theorem so that 𝑇𝑛 is consistent for 𝜃.

Theorem: If{𝑇𝑛 } is a sequence of consistent estimators of 𝜃 and 𝓰(𝜃) is a continuous function of 𝜃,


then {ℊ(Tn )} is consistent for 𝓰(𝜃)

Proof: Since Tn is consistent for 𝜃, for any 𝜖1 (> 𝑜)

𝑃{|𝑇𝑛 − 𝜃| ≤ 𝜖1 } → 1

As n→∞

Also , since 𝓰(𝜃) is a continuous function , given 𝜖(> 0)we can choose 𝜖1 (> 𝑜)such that

|𝑇𝑛 − 𝜃| ≤ 𝜖1 → |ℊ(Tn ) − ℊ(𝜃)| ≤ 𝜖

Therefore ,

𝑃{|𝑇𝑛 − 𝜃| ≤ 𝜖1 } ≤ 𝑃{|ℊ(Tn ) − ℊ(𝜃)| ≤ 𝜖}

But as n→∞, L.H.S → 1 and, consequently, R.H.S →1, 𝑖, 𝑒.

𝑃{|ℊ(Tn ) − ℊ(𝜃)| ≤ 𝜖} → 1

As n→∞. Hence ℊ(Tn ) is consistent forℊ(𝜃).

We can prove the following results:

(i) If {Tn } is consistent for, then 𝑇𝑛2 is consistent for 𝜃 2 .

(ii) If {Tn } is consistent for 𝜃(R and non-negative) then √𝑇𝑛 is consistent for √𝜃.

Proof For any 𝜖(> 0) we have

𝑃{|𝑇𝑛 − 𝜃| ≥ 𝜖}𝑃{|(√𝑇𝑛 − √𝜃)(√𝑇𝑛 − √𝜃)| ≥ 𝜖}

𝜖
= 𝑃 {|√𝑇𝑛 − √𝜃)| ≥ }
√𝑇𝑛 + √𝜃

𝜖
≥ 𝑃 {|√𝑇𝑛 − √𝜃| ≥ }
√𝜃

Since L. H. S→0, R. H.S →0 as n→∞

(iii) If {Tn } is consistent for 𝜃 and {T′n } 𝑖𝑠 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑓𝑜𝑟 𝜃′, then {𝑇𝑛 ± 𝑇′𝑛 } is consistent for 𝜃 + 𝜃′.

Proof: for any 𝜖(> 0),we have 𝑃{|(𝑇𝑛 + 𝑇𝑛 ′) − (𝜃 + 𝜃′)| ≥ 𝜖 }

≤ 𝑃{|𝑇𝑛 − 𝜃| + |𝑇 ′ 𝑛 − 𝜃′| ≥ 𝜖}
18

𝜖
≤ 𝑃 {|𝑇𝑛 − 𝜃| ≥ 2] 𝑈|𝑇 ′ 𝑛 − 𝜃′| ≥ 𝜖}

𝜖 𝜖
≤ 𝑃 {|𝑇𝑛 − 𝜃| ≥ 2} + 𝑃{|𝑇 ′ 𝑛 − 𝜃′| ≥ 2}→0

As n→∞.

There fore {𝑇𝑛 + 𝑇 ′ 𝑛 } is consistent for (𝜃 + 𝜃′)

(iv)if 𝑇𝑛 and 𝑇′𝑛 are consistent for 𝜃 and 𝜃’ respectively , 𝑇𝑛 𝑇′𝑛 is consistent for 𝜃𝜃′.

Proof: we can write

(𝑇𝑛 + 𝑇′𝑛 )2 − (𝑇𝑛 − 𝑇′𝑛 )2


𝑇𝑛 𝑇′𝑛 =
4

þ (𝜃 + 𝜃′)2 − (𝜃 − 𝜃′)2

4

EFFICIENCY:

If 𝑇1 and 𝑇2 are two unbiased estimators of a parameter 𝜃 , each having finite variance 𝑇1 is said to
be more efficient then 𝑇2 if 𝑉(𝑇1 ) >𝑉(𝑇2 ). The (relative) efficient of 𝑇1 relative to 𝑇2 is defined by

𝑉(𝑇 )
Eff(𝑇1 /𝑇2 )=𝑉(𝑇2 )
1

It is used to judge the efficiency of an unbiased estimator by comparing its variance with the
Cramer- Rao lower bound (C R B) .

Definition: Assume that the regularity condition of CR inequality hold (we call it a regular situation)
for family{𝑓(𝑥, 𝜃), 𝜃 ∈ 𝛺}. An unbiased estimator T* of 𝜃 is called most efficient if 𝑉(𝑇 ∗ ) equals the
CRB. In this situation, the ‘efficiency’ of any other unbiased estimator T of 𝜃 is defined by

𝑉(𝑇 ∗ )
Eff(T) =
V(T)

Where T* is the most efficient estimator defined above

Remarks:

(i)The above definition not proper in−

(𝑎) regular situation when there is no unbiased estimator whose variance equals the CRB
but an UMVUE exists and maybe found by other methods.

(b)Non-regular situations when an UMVUE exists and may be found by other methods

(ii)The UMVUE is ‘most efficient‘ estimator in the examples considered earlier all UMVUE, whose
variances equalled CRB are most efficient
19

Example Consider 𝑎, 𝑟, 𝑠(𝑥1 , … 𝑥𝑛 ) from a normal distribution𝑁(𝜇, 𝜃) where mean 𝜇 is known and
variance 𝜃(0 < 𝜃 < ∞ ) is to be estimated

1
We has seen that 𝑠 2 = 𝑛 ∑𝑛1(𝑥𝑖 − 𝜇)2 is UMVUE of 𝜃 for which the variance is equal to CRB and
1
consequently, 𝑠 2 is most efficient . Let 𝑠′2 = ∑𝑛1(𝑋𝑖 − ̅̅̅̅̅
𝑋̅)2
(𝑛−1)

2𝜃 2
Then 𝐸 (𝑆′2 ) = 𝜃 and 𝑉(𝑆′2 ) = 𝑛−1 so that the efficiency of 𝑠′2 is given by

2𝜃 2 /𝑛
Eff(s′2 ) =
2𝜃 2/(𝑛−1)

n−1
=
n

Asymptotic efficiency: As different from the above definition of efficiency we may define efficiency

in another way as follows, which may be called asymptotic efficiency.

Let us confine ourselves to consistent estimators which are asymptotically normally distributed.

Among this class, the estimator with the minimum asymptotic variance is called the ‘most efficient

estimator’. It is also called best asymptotically normal (BAN) or consistent asymptotically normal

efficient (CANE) estimator it we denote by avar(T*) the asymptotic variance of a BAN estimator T*

then the efficiency of any other estimator T (within the class of asymptotically normal estimators) is

defined by

𝑎𝑣𝑎𝑟(T∗)
𝐸𝑓𝑓(𝑇/T ∗)= 𝑎𝑣𝑎𝑟(𝑇)

Where avar (T) is the asymptotic variance of T.

Example: Let (𝑥1 , … , 𝑥𝑛 ) be a random sample from a normal distribution 𝑁(𝜇, 𝜎), Consider the
‘most efficient estimator 𝑥̅ and another estimator 𝑥̅ me. It can be show that both are CAN estimator.
We have

𝜎2
𝑉(𝑥̅ ) =
𝑛

𝜋 𝜎2
And 𝑉(𝑥̅ 𝑚𝑒 ) = 2 𝑛

So that the efficiency of 𝑥̅ 𝑚𝑒 is given by

2
Eff(𝑥̅ 𝑚𝑒 ) =
𝜋

Example: let T1 , T2 be two unbiased estimators of 𝜃, having the same variance. Show that the
correlation coefficient ρ between T1 , T2 cannot be smaller than 2e-1, where e is the efficiency of each
estimator,

Proof. Let To be the most efficient estimator then

𝑽(𝑻𝑶 )
𝑉 (𝑻𝟏 ) = 𝑉(𝑻𝟐 ) =
𝒆

Consider the unbiased estimator


20

𝑻𝟏 + 𝑻𝟐
𝑇=
2
1
Its variance is 𝑉 (𝑇) = [𝑉(𝑇1 ) + (𝑇𝟐 ) + 2𝜌√𝑉(𝑇1 )(𝑇2 )]
4

𝑉(𝑇𝑜 ) 𝑉(𝑇𝑜 ) 𝑉(𝑇𝑜 )


=[ + + 2𝜌 ]
𝑒 𝑒 𝑒

𝐼+𝜌
= 𝑉(𝑇𝑜 )
2𝑒

Since 𝑇𝑜 is UMVUE, V (T)≥ 𝑉(𝑇𝑜 ) which gives

𝐼+𝜌
≥ 1 𝑜𝑟 𝜌 ≥ 2𝑒 − 1
2𝑒

Example: let 𝑇𝑜 be an UMVME (or most efficient estimator) where 𝑇1 an unbiased with efficiency ‘e’.
If 𝜌 is the correction coefficient between 𝑇𝑜 and𝑇1 , then show that 𝜌 =√𝑒.

Soln: we have

𝑒 = 𝑉(𝑇𝑜 )/𝑉(𝑇1 )

Or 𝑉(𝑇1 ) = 𝑉(𝑇𝑜 )/𝑒

Consider the estimator

(1 − 𝜌√𝑒)𝑇𝑜 + √𝑒(√𝑒 − 𝜌)𝑇1


𝑇=
1 − 2𝜌√𝑒 + 𝑒

(Which the linear combination of 𝑇𝑜 , 𝑇1 with minimum variance) then T is also unbiased, having
variance

𝑉(𝑇𝑜 )
𝑉(𝑇) = [(1 + 𝜌2 𝑒 − 2𝜌√𝑒)𝑉(𝑇𝑜 ) + 𝑒(𝑒 + 𝜌2 − 2𝜌√𝑒)
𝑒
𝑉(𝑇𝑜 )
+2√𝑒(√𝑒 − 𝜌 − 𝜌𝑒 − 𝜌2 √𝑒)𝜌 [ ]
√𝑒
[1 − 2𝜌√𝑒 + 𝑒]2

(1 − 2𝜌√𝑒 + 𝑒)(1 − 𝑝 2 )
=
(1 − 2𝜌√𝑒 + 𝑒)2

(1−𝜌2)𝑉(𝑇𝑜 ) 1−𝜌2
Or 𝑉(𝑇) = = (1−𝜌2 )+( 𝑉(𝑇𝑜 )
1−2𝜌√𝑒+𝑒 √𝑒−𝜌)2

Since (1 − 𝜌2 ) and(√𝑒 − 𝜌 are both non-negative 𝑉(𝑇) ≤ 𝑉(𝑇𝑜 ) but since 𝑇𝑜 is UMVUE, 𝑉 (𝑇) 𝑉(𝑇𝑜 ).
therefore 𝑉(𝑇) = 𝑉(𝑇𝑜 ) , and 𝜌 = √𝑒

SUFFICIENCY CRITERION:
A preliminary choice among statistics for estimating 𝜃 , before having for a UMVUE as BAN
estimator, can be made on the basic of another enter on suggested by R.A fisher. This is called
‘sufficiency’ criterion.

Definition: let (𝒙𝟏 , … , 𝒙𝒏 ) be a random sample from the distribution of X having þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃)
𝜃 𝜖 𝛺.A statistic 𝑇 = 𝑇(𝒙𝟏 , … , 𝒙𝒏 ) is defined to be sufficient statistic if and only if the conditional
distribution of (𝒙𝟏 , … , 𝒙𝒏 ) given T=t does not depend on 𝜃, for any value t.

[Note: In such a case if we know the value of the sufficient statistic T, then the sample values are
not needed to tell us anything more about 𝜃].

Also the conditional distribution of any other statistic T (which is not for 𝛺 tray) given T is
independent of 𝜃.
21

A necessary and sufficient condition for T to be sufficient for 𝜃 is that the joint þ. 𝑑, 𝑓 of (𝒙𝟏 , … , 𝒙𝒏 )
should be of the form

𝑓 (𝑥1 , … , 𝑥𝑛 ; 𝜃) = 𝓰(T, 𝜃)ℎ(𝑥1 , … , 𝑥𝑛 )

Where the first term on 𝑟, ℎ, 𝑠., depends on T and 𝜃 and the second them is independent of 𝜃. 𝑇his is
known as Nyman’s Factorisation Theorem which provides a simple method of judging whether a
statistic T is sufficient

Remark: Any one to one function of a sufficient statistic is also a sufficient statistic

Example: Consider n Bernoulli trials with probability of success P. The associated Bernoulli
random variables (𝑥1 , … , 𝑥𝑛 ) have common distribution given by

𝑓 (𝑥, 𝑝) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 , 𝑥 = 0,1

The joint probability function of (𝒙𝟏 , … , 𝒙𝒏 ) is


n 𝑛
𝑓 (𝑥1 , . . , 𝑥𝑛 , 𝑝) = p ∑i xi
(1 − 𝑝)𝑛−∑𝑖 𝑥𝑖

= ℊ (∑ xi , p) (xi , xn )

𝑛 𝑛
Where ℊ(∑𝑛𝑖 𝑥𝑖 , 𝑝) = 𝑝 ∑𝑖 𝑥𝑖
(1 − 𝑝)𝑛−∑𝑖 𝑥𝑖

And 𝒽(𝑥1 , … , 𝑥𝑛 ) = 1

Therefore∑𝑛𝑖 𝑥𝑖 is sufficient for p, and, so is𝑥̅ = ∑𝑛𝑖 𝑥𝑖 /𝑛.

Example (𝑥1 , … , 𝑥𝑛 ) be a random sample from a position distribution P(𝜆)𝑖. 𝑒

𝑒 −𝜆 𝜆𝑥
𝑓 (𝑥𝑖 , 𝜆) = , 𝑥 = 0,1 …
𝑥!

The joint probability function of (𝑥1 , … , 𝑥𝑛 ) 𝑖𝑠

𝑒 −𝑛𝜆 𝜆𝜖𝑥𝑖
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜆) =
∏𝑛𝑖 𝑥𝑖 𝑖

= 𝓰 (∑ xi , 𝜆) 𝒽(xi , xn )
i

𝑛
Where 𝓰(∑ni xi , 𝜆) = 𝑒 −𝑛𝜆 𝜆∑𝑖 𝑥𝑖

1
𝒽(𝑥𝑖 , 𝑥𝑛 ) =
∏𝑛𝑖 𝑥𝑖 𝑖

Hence.
𝑛 𝑛

∑ 𝑥𝑖 𝑜𝑟 ∑ 𝑥𝑖 /𝑛
𝑖 𝑖

Are sufficient for 𝜆

Example: let (𝑥1 , … , 𝑥𝑛 )be a random sample from a Normal population 𝑁(𝜇, 𝜎).

Case I: 𝜇 unknown, 𝜎 known (=𝜎𝑜 )

1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜇) =
− ∑𝑛 {𝑥 −𝜇}2 /2𝜎 2
(𝜎𝑜 √2𝜋)𝑛𝑒 𝑖 𝑖 0
22

1
=
[∑𝑛 2 2 2
𝑖 𝑥1 +𝑛𝜇 −2𝑛𝑥𝜇]/2𝜎0
(𝜎𝑜 √2𝜋)𝑛−𝑒

= 𝓰(x̅, μ)𝒽(x1 , . . xn )

= [2𝜇2 − 2𝑛x̅, μ]/ 2𝜎02

Where 𝓰(x̅, μ) = 𝑒 − ∑𝑛𝑖 𝑥𝑖 / 2𝜎02

As

1
𝒽(𝑥𝑖 , . . 𝑥𝑛 ) = 𝑛𝑒
(𝜎𝑜 √2𝜋)

Which show that x̅ is sufficient for 𝜇 .

Case II: 𝜇 is know(= 𝜇𝑜 ), 𝜎 unknown

1 𝑛 2
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜎) = 𝑛𝑒 −∑𝑖 (𝑥𝑖−𝜇𝑜 ) /2𝜎02
(𝜎𝑜 √2𝜋)

= 𝓰 [∑(xi − μθ )2 , σ] 𝒽(𝑥𝑖 , 𝑥𝑛 )
i

Where
n
1 𝑛 2
𝓰 [∑(xi − μθ )2 , σ] = 𝑛𝑒 −∑𝑖 (𝑥𝑖−𝜇𝑜 ) /2𝜎02
i (𝜎𝑜 √2𝜋)

Which show that ∑𝑛𝑖(𝑥𝑖 − 𝜇𝑜 )2 is sufficient for 𝜎

Case III: Both μ and 𝜎 are unknown

1 𝑛 2
𝑓(𝑥𝑖 , 𝑥𝑛 , 𝜇, 𝜎) = 𝑛𝑒 −∑𝑖 (𝑥𝑖−𝜇𝑜 ) /2𝜎02
(𝜎𝑜 √2𝜋)

1 𝑛
𝑥𝑖2−2𝜇 ∑𝑛 2
= 𝑛𝑒 −[∑𝑖 𝑖 𝑥𝑖 +2𝜇 ] 2𝜎02
(𝜎𝑜 √2𝜋)

Which shows that [∑𝑛𝑖 𝑥𝑖 , ∑𝑛𝑖 𝑥12 ] an jointly sufficient for [𝜇, 𝜎] Similarly,[ 𝑥̅ , ∑(𝑥𝑖 , 𝑥)2 / n-1]are also
sufficient for [𝜇, 𝜎],

Example let (𝑥1 , … , 𝑥𝑛 ) be a random sample from a gamer distribution having þ, 𝑑, 𝑓

1 𝑥 þ−1
𝑓(𝑥, 𝜃, þ) = 𝑒 −𝜃𝑥 ,𝑥≥𝜃
𝜃 þ (þ)

We have

𝑛 þ−1
1 𝑛
𝑓(𝑥𝑖 , 𝑥𝑛 , 𝜃, þ) = 𝑛þ 𝑒
−∑𝑖 𝑥𝑖 /𝜃 (∏ 𝑥𝑖 )
𝜃 (þ)𝑛
𝑖

Case I 𝜃 unknown but þ is known

We can write
23

𝑛
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = [ 𝑛þ −∑ 𝑥𝑖/𝜃 ] [∏ 𝑥𝑖 ] −þ−1
𝜃 ([(þ))𝑛𝑒
𝑖

So that ∑𝑛𝑖 𝑥𝑖 (𝑜𝑟 𝑥̅ ) is sufficient for 𝜃.

Case II: 𝜃 Known but þ unknown

We can write

𝑛 þ−1
1
𝑓((𝑥1 , … , 𝑥𝑛 ), þ) = [ 𝑛þ (∏ 𝑥𝑖 ) ] [𝑒 ∑ 𝑥𝑖/𝜃 ]
𝜃 ([(þ))𝑛𝑒
𝑖

So that is sufficient for þ

Case III : Both 𝜃 and þ are unknown it is seen that (∑𝑛𝑖 𝑥𝑖 , ∏𝑛𝑖 𝑥𝑖 ) are jointly sufficient for (𝜃, þ)

Example: let (𝑥𝑖 , 𝑥𝑛 , ) be a random sample from the experiential distribution

1 −𝑥
𝑓(𝑥, 𝜃) = 𝑒 𝜃, 𝑥 ≥ 𝜃
𝜃

It follows from above that ∑𝑛𝑖 𝑥𝑖 (𝑜𝑟𝑥̅ ) is sufficient for 𝜃.

Example let (𝑥1 , … , 𝑥𝑛 )be a random sample from the distribution with þ, 𝑑, 𝑓

𝑓 (𝑥, 𝜃) = 𝜃𝑥 𝜃−1 , 𝜃 ≤ 𝑥 ≤ 1

We have

1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = 𝜃 𝑛 (∏ 𝑥𝑖 )𝜃−1 = [𝜃 𝑛 (∏ 𝑥𝑖 )𝜃 ] [ ]
∏𝑛𝑖 𝑥𝑖

So that ∏𝑛𝑖=1 𝑥𝑖 is sufficient for 𝜃

Example let(𝑥1 , … , 𝑥𝑛 )be a 𝑎. 𝑟. 𝑠 from the Laplace distribution having þ, 𝑑, 𝑓

1
𝑓 (𝑥, 𝜃) = 𝑒 −[𝑥−𝜃] , ∞ < 𝑥 < ∞
2

We have

1 − ∑𝑛 [𝑥 −𝜃]
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = 𝑒 𝑖=1 𝑖
2𝑛

For no single statistics T it is possible to express the above in the form ℊ[T, θ]𝒽(𝑥𝑖 , 𝑥𝑛 , ) . Hence
there exists no statistic T which taken alone is sufficient for θ. However the whole set (𝑥1 , … , 𝑥𝑛 ) or
the set of order statistics (𝑥(1) , … , 𝑥(𝑛) )is jointly sufficient for θ

Example let (𝑥1 , … , 𝑥𝑛 ) be a random sample from the Rectangular distribution 𝑅(0, 𝜃)
having þ, 𝑑, 𝑓.

1
𝑓 (𝑥, 𝜃) = , −𝜃 ≤ 𝑥 ≤ 𝜃
𝜃

We have
𝑛
1
𝑓(𝑥𝑖 , 𝑥𝑛 , 𝜃) = 𝑛 ∏ 𝐼[𝜃,𝜃](𝑋𝐼)
𝜃
𝑖=1

Where 𝐼𝐴(𝑥) is the indicator function for which


24

1 𝑖𝑓 𝑥∈ 𝐴
𝐼𝐴(𝑥) = {
0 𝑖𝑓 𝑥 ∉ 𝐴

But ∏𝑛𝑖=1 𝐼[𝜃,𝜃](𝑋𝐼) = 𝐼[𝑂,𝑋(1) (𝑋(1) 𝐼[𝑥(!),𝑜] (𝑥(𝑛) )

Where 𝑋(1) and 𝑥(𝑛) are the minimum and maximum of sample values (𝑥1 , … , 𝑥𝑛 )

Therefore, we can write

𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = ℊ[x(n) , 𝜃]𝒽((𝑥𝑖 , 𝑥𝑛 )

1
Where ℊ[x(n) , 𝜃] = 𝜃𝑛 𝐼[x(n) ,𝜃] (𝑥(𝑛) )

𝒽(𝑥𝑖 , 𝑥𝑛 ) = 𝐼[𝑂,𝑥(𝑛) ] (𝑥𝑖 )

Where shows that 𝑥(𝑛) is sufficient for 𝜃

Example : If x has þ, 𝑑, 𝑓

1
𝑓(𝑥, 𝜃) = ; −𝜃 ≤ 𝑥 ≤ 𝜃
𝜃

We can check that

1
𝑓 (𝑥𝑖 , 𝑥𝑛 , 𝜃) = 𝐼 (𝑛 )𝐼 𝑋
𝜃 𝑛 [−𝜃𝑋(𝑛) ] (𝑛) [𝑋(1),𝑂] (𝑛)

So that 𝑥(1) is sufficient for 𝜃

Example Let (𝑥1 , … , 𝑥𝑛 ) be a random sample from the rectangular distribution 𝑅(𝜃1 , 𝜃2 ) having
þ, 𝑑, 𝑓

1 𝜃1 ≤ 𝑥𝑖 ≤ 𝜃2
𝑓 (𝑥, 𝜃1 , 𝜃2 ) = {𝜃2 −𝜃1
𝑜
𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

The þ, 𝑑, 𝑓((𝑥1 , … , 𝑥𝑛 )) is given by

𝑛
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃1 , 𝜃2 ) = ∏ 𝐼[𝜃1 ,𝜃2](𝑋𝐼)
(𝜃2 − 𝜃1 )𝑛
𝑖=1

1 𝑖𝑓𝜃1 ≤ 𝑥𝑖 ≤ 𝜃2
Where 𝐼[𝜃𝑖𝜃 ](𝑥𝑖) = {
𝑖 0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

We can write
𝑛

∑ 𝐼[𝜃 (𝑥𝑖 ) =𝐼 𝐼[𝑋(𝐼),𝜃 ] (𝑥(𝑖) )


𝑖𝜃𝑖 ] (𝑥(𝑖)) 2
𝑖 [𝜃𝑖𝑥(𝑖) ]

= 𝓰[𝑥(𝑖) , 𝑥(𝑛) , 𝜃1 𝜃2 ]𝓱(𝑥𝑖 , 𝑥𝑛 )

Where

𝓰[𝑥(𝑖) , 𝑥(𝑛) , 𝜃1 𝜃2 ] = 𝑰[𝜽𝟏.𝑿(𝒏)] (𝒙(𝒊) )I{X(1) ,θ2] (𝑥(𝑖) )

And 𝓱((𝑥1 , … , 𝑥𝑛 )) = 1
25

Hence [𝑥(1) , … , 𝑥(𝑛) ] are jointly sufficient for 𝜃1 , 𝜃2

Corollary : If 𝜃1 is known 𝑥(𝑛) is sufficient for 𝜃2

If 𝜃1 is known 𝑥(𝑖) is sufficient for 𝜃1

Example: let ((𝑥1 , … , 𝑥𝑛 )) be 𝑎, 𝑟, 𝑠 from the rectangular distribution R (-𝜃, 𝜃).

1
𝑓 (𝑥, 𝜃) = , −𝜃 ≤ 𝑥 ≤ 𝜃
2𝜃

Then
𝑛
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = ∏ 𝐼[−𝜃,𝜃](𝑋𝑖)
(2𝜃)𝑛
𝑖=1

1
= 𝐼 (𝑥 )𝐼 𝑥
(2𝜃)𝑛 [−𝜃,,𝑥(𝑛) (𝑖) [𝑋(𝑛),𝜃] (𝑛)

So that [𝑥(1) , … , 𝑥(𝑛) ] are jointly sufficient for 𝜃

1 1
Example: [𝑥(1) , … , 𝑥(𝑛) ] are jointly sufficient for𝜃 in 𝑅(𝜃 − , 𝜃 + ) and 𝑅(𝜃, 𝜃 + 1)
2 2

Example: Let (𝑥1 , … , 𝑥𝑛 )be a random from an exponential distribution

𝑓 (𝑥) = 𝜆𝑒 −𝜆(𝑥−𝜃) , 𝜃 ≤ 𝑥 < ∞

Case I : 𝜆 Unknown 𝜃 known (= 𝜃𝑜 )

𝑛 𝑛

𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜆) = 𝜆𝑛−𝜆
𝑒 ∑(𝑥𝑖 − 𝜃) ∏ 𝐼[𝜃,∞)(𝑥𝑖 )
𝑖 𝑖=1

Which show that ∑𝑛𝑖(𝑥𝑖 − 𝜃𝑜 ) is sufficient for 𝜆or 𝑥̅ is sufficient for 𝜆

Case II: 𝜆 know (= 𝜆𝑜 ) , 𝜃 Unknown

𝑛 𝑛

𝑓((𝑥𝑖 , 𝑥𝑛 , 𝜃) = 𝜆𝑜𝑒 −𝜆𝑜 ∑(𝑥𝑖 − 𝜃) ∏ 𝐼[𝜃,∞)(𝑋𝑖)


𝑖 𝑖=1

= 𝜆𝑜𝑒 −𝜆𝑜 ∑ 𝑥𝑖 + 𝜆𝑛𝜃 𝐼[𝜃,∞)(𝑋(𝑖) )


𝑖

∏ 𝑖𝑥 (𝑖), 𝑥)(𝑥(𝑖) )

𝑛
= {𝑒 𝜆𝑛𝜃 𝐼[𝜃,∞)(𝑋(𝑖) ) } {𝑥𝑜𝑒 𝑛−𝜆𝜃 ∑𝑖 𝑥𝑖
𝐼[𝑥,(𝑖)∞)(𝑋(𝑖)) }

Which shows that 𝑥(𝑖) is sufficient for 𝜃

Case III: Both 𝜆, 𝜃 unknown

It is easy to check that [∑ 𝑥, 𝑥(𝑖)] are jointly sufficient for [𝜆, 𝜃]

METHHODS OF ESTIMATION:
26

For important methods of obtaining estimators are (I) methods of moments,(II) methods of
maximum likelihood (III)method of minimum χ2 and (IV) method of least squares.

(I)Method of moments

Suppose the distribution of a random variable X has K parameters (𝜃1 , 𝜃2 , … … . . 𝜃𝑘 ) which have to
be estimated. let 𝜇𝑟 = 𝐸(𝑥 𝑟 ) denote the 𝑟 𝑡ℎ moment of about 𝑂. in general 𝜇′𝑟 is a known
function of 𝜃1 , … , 𝜃𝑘 so that = 𝜇𝑟 (𝜃1 , … 𝜃𝑘 )Let (𝑥1 , … 𝑥𝑛 ) be a random sample from the distribution
of X and let 𝑚𝑟 = ∑𝑛𝑖 𝑥𝑖𝑟 /𝑛 be the 𝑟 𝑡ℎ . Sample moment from the equation

𝑚′𝑟 = 𝜇′𝑟 (𝜃1 , … 𝜃𝑘 ), 𝑟 = 1, . . . 𝑘

Whose solution is say 𝜃̂1 … 𝜃̂𝑘 , where 𝜃̂𝑖 is the estimate of 𝜃𝑖 (𝑖 = 1, . . 𝑘) Those are the method of
moments estimators of the parameters.

Example let 𝑥 = 𝑁(𝜇, 𝜎)

𝜇′1 = 𝜇

𝜇′2 = 𝜎 2 + 𝜇2

The equation
𝑥̅ = 𝜇

∑ 𝑥𝑖2
= 𝜎 2 + 𝜇2
𝑛

Have the solution

Let
𝜇 = 𝑥̅

∑𝑛 2
𝑖 𝑥𝑖 ∑𝑛 ̅̅̅2
𝑖 (𝑥𝑖 −𝑥)
𝜎=√ 𝑛
− 𝑥̅ 2 =√ 𝑛

Example let 𝑥 ~P (λ) and let (𝑥1 , . . , 𝑥𝑛 ) be random sample from P(λ).

𝜇′1 = λ

The equation
𝑥̅ = λ

Provides the estimator


λ = 𝑥̅

Example let (𝑥1 , . . , 𝑥𝑛 ) be a random sample from the exponential distribution

𝑓(𝑥, 𝜃) = 𝜃𝑒 −𝜃𝑥 , 𝑥 ⩾ 𝜃

1
𝜇′1 =
𝜃
27

The moment equation

1
𝑥̅ =
𝜃

Provides the estimator

1
𝜃̂ =
𝑥̅

Remark: (I) the method of moments estimators are not uniquely defined. We may equate the

central moments instead of the raw moments and obtain solutions.

(II) These estimator are not, in general, consistent and efficient but will be so only if the parent

distributions is of particular form.

(III) When population moments do not exist (𝑒. 𝑔. 𝐶auchy population) this method of estimation is

inapplicable.

METHOD OF MAXIMUM LIKELIHOOD

Consider𝑓(𝑥1 , . . , 𝑥𝑛 , 𝜃), the joint þ, 𝑑, 𝑓 of sample (𝑥1 , . . , 𝑥𝑛 ) of observations of 𝑎, 𝑟, 𝑠. 𝑋 having the

þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃) whose parameters 𝜃 is to be estimated. When the values (𝑥1 , . . , 𝑥𝑛 ) are

given, 𝑓(𝑥1 , . . , 𝑥𝑛 , . . 𝜃) may be looked upon as a function of 𝜃 which is called the likelihood function

of 𝜃 and is denoted by 𝐿(𝜃) = 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) it gives the likelihood that the 𝑟, 𝑣. (𝑥1 , . . , 𝑥𝑛 )assumes

the value (𝑥1 , . . , 𝑥𝑛 ) when 𝜃 is the parameter.

We want to know from which distribution (𝑖. 𝑒. for what value of 𝜃) is the likelihood largest for this

set of observations. In other words we want to find the value of 𝜃, denoted by 𝜃̂ which

maximizes 𝐿(𝑥1 , . . , 𝑥𝑛 , 𝜃). The value 𝜃̂ maximizes the likelihood function is in general, a function of

𝑥1 , . . , 𝑥𝑛 say

𝜃̂ = 𝜃̂(𝑥1 , . . , 𝑥𝑛 )

Such that 𝐿(𝜃̂) = max 𝐿(𝜃, , 𝑥1 , . . , 𝑥𝑛 ) 𝜃 𝜖 𝛺

Then 𝜃̂ is called the maximum likelihood estimator or MLE.

In many cases it would be more convenient to deal with log 𝐿(𝜃), rather then 𝐿(𝜃), since log 𝐿 (𝜃)

is maximized for the some value of 𝜃 as 𝐿(𝜃). For obtaining 𝑚. ℓ. 𝑒 we find the value of 𝜃 for which
28

𝜕
log 𝐿(𝜃) = 0 … … … (1)
𝜕𝜃

We must however, check that this provides the absolute maximum. It the derivate dose not exists at

𝜃 = 𝜃̂ or equation (1) is not solvable this method of solving (1) will fail.

Example: Let (𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the Bernoulli distribution.

𝑓(𝑥, 𝜃) = 𝜃 𝑥 (1 − 𝜃)𝑖−𝑥 , 𝑥 = 𝜃, 1

𝑛 𝑛
Then the likelihood 𝐿 (𝜃, 𝑥1 , . . , 𝑥𝑛 ) = 𝜃 ∑𝑖 𝑥𝑖
(𝑖 − 𝜃)𝑛−∑𝑖 𝑥𝑖

And log 𝐿(𝜃) = (∑𝑛𝑖 𝑥𝑖 ) log 𝜃 +(𝑛 − ∑𝑛𝑖 𝑥𝑖 ) log (1- 𝜃)

Differentiating and equating to zero, we have

𝜕
log 𝐿(𝜃) = 0
𝜕𝜃

∑𝑛
𝑖 𝑥𝑖 𝑛−∑𝑛
𝑖 𝑥𝑖
𝑖, 𝑒 𝑒
− 𝑖−𝜃
=0

∑𝑛
𝑖 𝑥𝑖 −𝑛𝜃
Or 𝜃(𝑖−𝜃)
=0

Or 𝑒 = ∑𝑛𝑖 𝑥𝑖 /𝑛=𝑥̅

𝑚. ℓ. 𝑒 of 𝜃 is 𝜃̂ = 𝑥̅

Example: Let (𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the Poisson’s distribution

𝑒 −𝜆 𝜆𝑥
𝑓 (𝑥, 𝜆) = , 𝑥 = 0,1,2, ….
𝑥!

∑𝑛
𝑒 −𝑛𝑥 𝜆 𝑖 𝑥𝑖
Then 𝐿(𝜆, 𝑥1 , . . , 𝑥𝑛 ) = ∏𝑛
𝑖 𝑥𝑖 !

And log 𝐿(𝜆) = 𝑛𝑥 + (∑𝑛𝑖 𝑥𝑖 )log 𝜆 − ∑𝑛𝑖 𝑙𝑜𝑔𝑥𝑖 !

𝜕 ∑𝑛
𝑖 𝑥𝑖
Or 𝜕𝜃
log 𝐿 (𝜆) = −𝑛 + 𝜆

Equating to zero we get 𝜆 = 𝑥̅


29

𝑚. ℓ. 𝑒 of 𝜆 is 𝜆̂ = 𝑥̅

Example: Let 𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the truncated Binomial distribution having þ, 𝑑, 𝑓

𝑥 2−𝑥
2 𝜃 (𝑖 − 𝜃)
𝑓 (𝑥, 𝜃) = ( ) , 𝑥 = 𝑖, 1,2(𝜃 < 𝜃 < 1
𝑥 𝑖 − [𝑖 − 𝜃]2

2 𝜃2𝑥𝑖 (1−𝜃)2𝑛−2𝑥𝑖
Then 𝐿 (𝜃, 𝑥𝑖 , 𝑥𝑛 ) = ∏𝑛𝑖 ( ) [𝑖−(𝑖−𝜃)2𝑛 ]
𝑥𝑖

2
And log 𝐿(𝜃) = ∑𝑛𝑖 𝑙𝑜𝑔 ( ) + (∑ 𝑥𝑖 ) log 𝜃 + (2𝑛 − 2𝑥𝑖 ) log(1 − 𝜃) − 𝑛𝑙𝑜𝑔[ 1 − (1 − 𝜃)2 ]
𝑥𝑖

𝜕 ∑𝑛𝑖 𝑥𝑖 ∑𝑛𝑖 𝑥𝑖 − 2𝑛 2𝑛 (1 − 𝜃)
log 𝐿 (𝜃) = + +
𝜕𝜃 𝜃 1−𝜃 1 − (1 − 𝜃)2

Equating to zero we get

∑ 𝑥𝑖 [(1 − 𝜃) {1 − (1 − 𝜃)2 }] + (∑ 𝑥𝑖 − 2𝑛)[ 𝜃{1 − (1 − 𝜃)2 }]

−2𝑛𝜃 (1 − 𝜃)2 ] = 𝜃

Or ∑ 𝑥𝑖 [1 − (1 − 𝜃)2 ] = 2𝑛𝜃

Or ∑ 𝑥𝑖 [ 𝜃(2 − 𝜃)] = 2𝑛𝜃

2
Or 2−𝜃 =
𝜋

2
Or 𝜃 = 2−𝜋

2
𝑚. ℓ. 𝑒 is 𝜃 = 2 − 𝜋

Example: Let (𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the normal distribution 𝑁(𝜇, 𝜎)

Case I: 𝜇 unknown but 𝜎 = 𝜎0 (known)

1 ∑𝑛 2
𝑖 (𝑥𝑖 −𝜇𝜃) /2𝜎 2
Then 𝐿 (𝜇, 𝑥1 , . . , 𝑥𝑛 ) = 𝑛 𝑒−
(𝜎√2𝜋)

And 𝑙𝑜𝑔𝐿(𝜇) = −𝑛𝑙𝑜𝑔(𝜎𝜃 √2π) − ∑ni(xi − μ)2 /2σ2θ


30

𝜕 ∑𝑁
𝐼 (𝑋𝐼 −𝜇)
Or log 𝐿(𝜇) =
𝜕𝜃 σ2θ

Equating to zero we get 𝜇 = 𝑥̅

𝑚. ℓ. 𝑒 Of 𝜇̂ = 𝑥̅

Case II: μ = μ0 (known)but σ unknown

1 ∑𝑛 2
𝑖 (𝑥𝑖 −𝜇𝜃) /2𝜎 2
Then 𝐿(𝜎, 𝑥1 , . . , 𝑥𝑛 ) = 𝑛 𝑒−
(𝜎√2𝜋)

∑ 𝑛 2
𝑛 − 𝑖 (𝑥𝑖−𝜇𝜃)
And log 𝐿 (𝜎) = −𝑛𝑙𝑜𝑔𝜎 − 2 log(2𝜋) − 2𝜎 2

𝜕 𝑛 ∑𝑛
𝑖 (𝑥𝑖 −𝜇𝜃)
2
Or 𝜕𝜎
𝑙𝑜𝑔𝐿(𝜎) = − 𝜎 + 𝜎3

∑𝑛
𝑖 (𝑥𝑖 −𝜇𝜃)
2
Equating to zero we get 𝜎=√ 𝑛

∑𝑛
𝑖 (𝑥𝑖 −𝜇𝜃)
2
𝑚. ℓ. 𝑒 Of 𝜎 is 𝜎̂ = √ 𝑛

Case III: Both 𝜇 and 𝜎 are unknown

1 ∑𝑛 2
𝑖 (𝑥𝑖 −𝜇𝜃) /2𝜎 2
Then 𝐿 (𝜇, 𝜎, 𝑥1 , . . , 𝑥𝑛 ) = 𝑛 𝑒−
(𝜎√2𝜋)

𝑛 𝑛 ∑(𝑥𝑖−𝜇)2
And 𝑙𝑜𝑔𝐿(𝜇, 𝜎) = − 2 𝑙𝑜𝑔𝜎 − 2 log(2𝜋) − 2𝜎 2

Differentiating partially 𝑤. 𝑟. 𝑡 𝜇, 𝜎 we get

𝜕 ∑(𝑥𝑖 − 𝜇)
𝑙𝑜𝑔𝐿(𝜇, 𝜎) =
𝜕𝜇 2𝜎 2

𝜕 𝑛 ∑(𝑥𝑖−𝜇)2
And 𝜕𝜎
𝑙𝑜𝑔𝐿(𝜇, 𝜎) = 𝜎 + 𝜎3

∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
Equating to zero both the derivatives and solving the equations we get 𝜇 = 𝑥̅ and 𝜎 = √
𝑛

∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
𝑚. ℓ. 𝑒 are 𝜇̂ = 𝑥̅ and 𝜎̂ = √
𝑛
31

Example: Let (𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the exponential distribution

1 −𝑥
𝑓(𝑥, 𝜃) = 𝑒 𝜃, 𝑥 ⩾ 𝜃
𝜃

1 𝑥
Then 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) = 𝜃𝑒 𝑒 − ∑𝑖 𝑥𝑖/𝜃

And log 𝐿 (𝜃) = −𝑛𝑙𝑜𝑔𝜃 − ∑𝑥𝑖 𝑥𝑖 /𝜃

𝜕 𝑛 ∑𝑛𝑖 𝑥𝑖
𝑙𝑜𝑔𝐿(𝜃) = − + 2
𝜕𝜃 𝜃 𝜃

Quoting to zero, we get 𝜃 = 𝑥̅ so that the 𝑚. ℓ. 𝑒 of 𝜃is 𝜃̂ = 𝑥̅

Example: Let (𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the exponential distribution

𝑓 (𝑥, 𝜃) = 𝑒 −(𝑥−𝜃) , 𝑥 ⩾ 𝜃

Then 𝐿(𝜃, 𝑥𝑖 , … 𝑥𝑛 ) = 𝑒 −𝑛(𝑥−𝜃)

If we differentiate 𝑙𝑜𝑔𝐿(𝜃) 𝑤, 𝑟, 𝑡 𝜃 and equate to zero we get 𝑛 = 𝜃 which does not yield any result.

Now 𝐿(𝜃) is maximized by choosing the maximum value of 𝜃 subject to the condition

𝜃 ≤ 𝑥(1) ≤ 𝑥(2) ≤, … , ≤ 𝑥(𝑛)

Which shows that 𝜃 = 𝑥(1) so that the 𝑚. ℓ. 𝑒 of 𝑥̂ = 𝑋1

Example: X has 𝑝. 𝑑. 𝑓

𝑓 (𝑥, 𝜆, 𝜃) = 𝜆𝑒 −𝜆(𝑥−𝜃) , 𝑥 ⩾ 𝜃

𝑚. ℓ. 𝑒𝜃̂ = 𝑥(𝑖)

1
𝜆=
𝑥̅ − 𝑥(1)

Example: Let(𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the distribution

𝑓 (𝑥, 𝜃) = 𝜃𝑥 𝜃−1 , 0 ≤ 𝑥 ≤ 1(𝜃 > 1)

𝜃−1
Then 𝐿 (𝜃, 𝑥1 , . . , 𝑥𝑛 ) = 𝜃 𝑛 (∏𝑛𝑖 𝑥𝑖 )
32

And log 𝐿(𝜃) = 𝑛𝑙𝑜𝑔 𝜃 + (𝜃 − 1) ∑𝑛𝑖 𝑙𝑜𝑔𝑥𝑖

𝜕 𝑛
Or 𝜕𝜃
𝑙𝑜𝑔𝐿(𝜃) = 𝜃 + ∑𝑛𝑖 𝑙𝑜𝑔𝑥𝑖

𝑛
Equating to zero we get 𝜃 = − ∑ log 𝑥
𝑖

𝑛
𝑚. ℓ. 𝑒 of 𝜃̂ = ∑𝑛
𝑖 𝑙𝑜𝑔𝑥𝑖

Example: Let(𝑥1 , . . , 𝑥𝑛 ) have rectangular distribution 𝑅 (0, 𝜃) having þ, 𝑑, 𝑓

1
𝑓(𝑥, 𝜃) = ,0 ≤ 𝑋 ≤ 𝜃
𝜃

1
Then 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) = , 0 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃
𝜃𝑛

Which is maximized when 𝜃 is maximum subject to the condition

0 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃

The minimum value of 𝜃 is 𝑥(𝑛) so that

𝑚. ℓ. 𝑒 of 𝜃 is𝜃̂ = 𝑥(𝑖)

Example: Let(𝑥1 , . . , 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 of the regular distribution 𝑅(−𝜃, 𝜃) having þ, 𝑑, 𝑓

1
𝑓(𝑥, 𝜃) = ,0 ≤ 𝑋 ≤ 𝜃
2𝜃

1
Then 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) = , −𝜃 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃
2𝑛 𝜃 𝑛

When is maximized when 𝜃 is minimum subject to the condition −𝜃 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃

So that since −𝜃 ≤ 𝑥(1) or 𝜃 ⩾ −𝑥(1)

𝑚. ℓ. 𝑒 of 𝜃 is 𝜃̂ = −𝑥(𝑖)

Example: Let(𝑥1 , . . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the regular distribution 𝑅(𝜃1 , 𝜃2 ) having þ, 𝑑, 𝑓


33

1
𝑓 (𝑥, 𝜃1 , 𝜃2 ) = , 𝜃 ⪕ 𝑥 ⪕ 𝜃2
𝜃2 − 𝜃1 1

1
Then 𝐿 (𝜃1 , 𝜃2 , 𝑥𝑖 𝑥𝑛 ) = (𝜃 𝑛 , 𝜃1 ⪕ 𝑥(𝑖) ⪕ ⋯ 𝑥(𝑛) ⪕ 𝜃2
2 𝜃1)

In maximized when (𝜃2 − 𝜃1 ) is minimum 𝑖, 𝑒𝜃1 is maximum and 𝜃2 is minimum subject to the

condition

𝜃1 ⪕ 𝑥(𝑖) ⪕. ⪕ 𝑥(𝑛) ⪕ 𝜃2

We have to take 𝜃2 = 𝑥(𝑛) and 𝜃1 = 𝑥(𝑖) so that 𝑚. ℓ. 𝑒 𝑜𝑓𝜃1 and 𝜃2 are 𝜃̂1 = 𝑥(𝑖) and𝜃̂2 = 𝑥(𝑛)

Example: Let(𝑥1 , . . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the regular distribution 𝑅(𝜃 − 𝑐, 𝜃 + 𝑐) having þ, 𝑑, 𝑓

1
𝑓(𝑥, 𝜃) = ,𝜃−𝑐 ⪁ 𝑥 ⪕ 𝜃+𝑐
(2𝑐)

1
Then 𝐿 (𝜃, 𝑥𝑖 , 𝑥𝑛 ) = , 𝜃, 𝑐 ⪁ 𝑥(𝑖) ⪁. ⪁ 𝑥(𝑛) ⪁ 𝜃 + 𝑐is maxi zed for any 𝜃such that
(2𝑐)𝑛

𝜃 − 𝑐 ⪁ 𝑥(𝑖) ⪁ ⋯ ⪕ 𝑥(𝑛) ⪕ 𝜃 + 𝑐

𝑖. 𝑒 𝜃 − 𝑐 ⪁ 𝑥(𝑖) or 𝜃 ⪕ 𝑥(𝑖) 𝑐 and 𝜃 + 𝑐 ⩾ 𝑥(𝑛) − 𝑐

And 𝜃 + 𝑐⩾𝑥(𝑛) is 𝜃⩾𝑥(𝑛) − 𝑐

𝑥(𝑖) +𝑥(𝑛)
This shows that any statistics which lies in between 𝑥(𝑛) − 𝑐 and 𝑥(𝑖) + 𝑐, 𝑒. 𝑔 is 𝑎, 𝑚. 𝑙. 𝑒 the
2

𝑚. 𝑙. 𝑒 is not unique in this case

Example 12 It x has 𝑅(𝜃, 𝜃 + 𝐼),any statistics which lies between 𝑥(𝑛) − 1 and 𝑥(𝑖) is a 𝑚. 𝑙. 𝑒 if 𝜃

Example 13 Let(𝑥1 , . . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the regular distribution 𝑅(𝜃, 2𝜃) having þ, 𝑑, 𝑓

1
𝑓 (𝑥, 𝜃) = 𝜃 ⪁ 𝑥 ⪁ 2𝜃
𝜃

1
Then 𝐿 (𝜃, 𝑥𝑖 , . . 𝑥𝑛 ) = 𝜃𝑛 , 𝜃 ⪁ 𝑥(𝑖) ⪁ ⋯ ⪁ 𝑥(𝑛) ⪁ 2𝜃

Is maxi zed when 𝜃 is minimum subject to the condition 𝜃 ⪁ 𝑥(𝑖) ⪁ ⋯ ⪁ 𝑥(𝑛) ⪁ 2𝜃

𝑖. 𝑒 𝜃 ⪁ 𝑥(1) … … (𝑖)

And 𝜃 ⩾ 𝑥(𝑛) … … . (𝑖𝑖)


34

𝑥(𝑛) 2𝜃
Since 𝑥(𝑖)
⪁ 𝜃
=2

𝑥(𝑛)
𝑖. 𝑒 2
⪁ 𝑥(𝑖)

𝑥(𝑛)
The minimum value of 𝜃 satisfying (𝑖 ), (𝑖𝑖) is so that the 𝑚. 𝑙. 𝑒of 𝜃 is
2

𝑥(𝑛)
𝜃̂ =
2

Example: Let(𝑥𝑖 , . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the regular distribution 𝑅(−𝜃, 𝜃) having þ, 𝑑, 𝑓

1
𝑓 (𝑥, 𝜃) = , −𝜃 ⪁ 𝑥 ⪁ 𝜃
2𝜃

Then

1
𝐿(𝜃, 𝑥𝑖 , . . 𝑥𝑛 ) = , 𝜃 ⪁ 𝑥(1) ⪁. . ⪁ 𝑥(𝑛) ⪁ 𝜃
(2𝜃)𝑛

This is maximized when 𝜃 is minimum subject to the condition

𝑥(𝑛) ⪁ 𝑜𝑟𝜃 ⩾ 𝑥(𝑛)

And −𝜃 ⪁ 𝑥(𝑖) 𝑜𝑟𝜃 ⩾ −𝑥(𝑖)

This happens when 𝜃 = max(−𝑥(𝑖) , 𝑥(𝑛) )

𝑚, 𝑙, 𝑒 of 𝜃̂ = max(−𝑋(1) , 𝑋(𝑛) )

Example: Let(𝑥1 , . . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from the Laplace distribution with þ, 𝑑, 𝑓

1
𝑓 (𝑥, 𝜃) = 𝑒 −[𝑥−𝜃] , −∞ < 𝑥 < ∞
2

Then

1 𝑛
𝐿(𝜃, 𝑥𝑖 , 𝑥𝑛 ) = 2 𝑒 − ∑𝑖 [𝑥−𝜃]

And 𝑙𝑜𝑔𝐿(𝜃) = −𝑛𝑙𝑜𝑔2 − ∑𝑛𝑖[𝑥𝑖 − 𝜃]


35

Which is maximized when 𝜃 is the sample median.

𝑚, 𝑙, 𝑒 of 𝜃 is 𝜃̂ = 𝑥̂𝑚𝑒

Example: Let(𝑥1 , . . . 𝑥𝑛 ) be n independent 𝑟, 𝑣, 𝑠 such that 𝑥𝑟 has normal distribution 𝑁(𝑟𝜃, 𝑟 3 𝜎 2 )

We have to estimate 𝓮and then

𝑛
𝐼 −
𝐼
(𝑥 −𝑟𝜃)2
𝐿(𝜃, 𝑥𝑖 , 𝑥𝑛 ) = ∏ ⌈ 𝑒 2𝑟3 𝜎 2 𝑟
𝑟=𝑖
√2𝜋𝑟 3 𝜎 2

1 𝐼 1 (𝑥 −𝑟𝜃)2
∑𝑛 𝑟 3
=( ) −2𝜎 2 𝑟1 𝑟
2𝜋𝜎 (𝑛𝑖 )2𝑒

1 3 1 (𝑥𝜎 −𝑟𝜃)2
And log 𝐿(𝜃) = 𝑛𝑙𝑜𝑔 ( ) − 𝑙𝑜𝑔𝑛1 − ∑
2𝜋𝜎 2 2𝜎 2 𝑟3

𝜕 1 (𝑥𝑟 −𝑟𝜃 )
Or 𝜕𝜃
𝑙𝑜𝑔𝐿(𝜃) = 2𝜎 2 ∑𝑛𝑟=1 𝑟2

Equating to zero , we get


𝑛
(𝑥𝑟 − 𝑟𝜃 )
∑[ ]=𝜃
𝑟2
𝑖

∑𝑛
𝑖 𝑥𝑟 /𝑟
2
Or 𝜃= ∑𝑛 2
𝑖 𝑖/𝑟

𝑚. 𝑙. 𝑒 Of 𝜃 is

∑𝑛𝑖 𝑥𝑟 /𝑟 2
𝜃̂ =
∑𝑛𝑖 𝑖/𝑟 2

𝜎 2
We have 𝐸(𝜃̂) = 𝜃, 𝑉(𝜃̂) = ∑𝑛(𝐼⁄
𝐼 𝑟)

Optimum properties of MLE: (i) If 𝜃̂ is 𝑚. 𝑙. 𝑒 of 𝜃 and Ψ (𝜃) is a simple valued function of 𝜃 with

unique inverse, then Ψ(𝜃̂) is the 𝑚. 𝑙. 𝑒 of Ψ (𝜃).

(ii) If a sufficient statistics exists for 𝜃 𝑚. 𝑙. 𝑒 𝜃̂ is a function of this sufficient statistics.

(iii) Suppose 𝑓(𝑥, 𝜃) statistics certain regularity conditions and 𝜃̂𝑛 = 𝜃̂𝑛 (𝑥1 , . . . 𝑥𝑛 ) is the 𝑚. 𝑙. 𝑒 of a

random sample of size n from 𝑓(𝑥, 𝜃)

Then- (𝑎) {𝜃̂𝑛 } is consistent sequence of estimators of 𝜃


36

(b) 𝜃̂𝑛 is asymptotically normally distributed with mean 𝜃 variance


1
𝜕 2
𝑛𝐸[ 𝑙𝑜𝑔𝑓(𝑥,𝜃)]
𝜕𝜃

(c)The sequence of estimators 𝜃̂𝑛 has the smallest asymptotic variance among all consistent,

asymptotically normally distributed estimate of 𝜃, 𝑖. 𝑒 𝜃̂𝑛 is BAN or CANE or most efficient.

(iii) METHOD OF MINIMUM χ2: Let X be 𝑎. 𝑟. 𝑣 with þ. 𝑑. 𝑓 𝑓(𝑥, 𝜃) where parameter to be

estimated 𝜃 = (𝜃1 , … . . 𝜃𝑟 ) Suppose S1, S2....S𝓀 are 𝒽 mutually exclusive classes which from a

partition of the range of X. Let the profanity at X falls in SJ be

þ, (𝜃) = ∫ 𝑓 (𝑥, 𝜃)𝑑𝑥 , 𝑗 = 1,2, … 𝓀


𝑆𝑗

Where ∑𝓀
𝑗=1 þ, ( 𝜃) = 1

Suppose ,in practice ,corresponding to a random sample of n observations from the distribution of

X we are given the frequencies (𝑁1 , … . . , 𝑁𝓀 ) where 𝑁𝑗 =observed number of sample observations

falling in the class 𝑆𝐽 (𝑗 = 1,2 … . . 𝓀) such that ∑𝓀


𝑖 𝑁𝑗 = 𝑛 then the expected number of observation

in 𝑆𝐽 is 𝑛þ𝑗 (𝜃), Define

χ2 = ∑[𝑛𝑗 − 𝑛þ𝑗 (𝜃)]2 /𝑛þ𝑗 (𝜃)


𝑗=1

Where 𝑛𝑗 is the observed value of 𝑁𝑗 (𝑗 = 1,2 … . 𝓀) Evidently 𝑥 2 will be a function of 𝜃(𝑜𝑟 𝜃𝑖 , . . 𝜃𝑟 )to

obtain the estimator of 𝜃we minimise 𝑥 2 𝑤. 𝑟, 𝑡 𝜃. The minimise 𝑥 2 estimator of 𝜃 is that 𝜃̂ which

minimise above χ2.

The equation (s) for determining the estimator(s) by this me that are

𝜕χ2 𝜕χ2
= 𝜃 𝑜𝑟 = 0 (𝑖 = 1, … … 𝑟)
𝜕𝜃 𝜕𝜃

Remarks:

(i)Often it is difficult to obtain 𝜃̂ which minimum χ2, hence χ2 is changed to modified

χ2 = ∑[𝑛𝑗 − 𝑛þ𝑗 (𝜃)]2 𝑛𝑗


𝑗=1
37

(If 𝑛𝑗 = 0 , unity is used). The modified minimum χ2 estimator of 𝜃 is 𝜃̂ which minimises the

modified χ2

(ii) For large n, the minimum χ2 and likelihood equations are identical and, consequently, provide

identical minimum χ2maximum likelihood estimators.

(iii) The minimum χ2 estimators are consistent asymptotically normal and efficient .

Example: Let(𝑥1 , . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from a Bernoulli distribution having þ. 𝑑. 𝑓

𝑓(𝑥, 𝜃) = 𝜃 𝑥 (1 − 𝜃)1−𝑥 , 𝑥 = 0,1

Take 𝑁𝑗 =the number of observations equal to j for 𝒿= 0,1 Here the range of X is pinioned into the

two sets consisting of the minimises 𝜃,and 𝑖 respectively then

þ0 (𝜃) = 𝑃 (𝑥 = 0) = 1 − 𝜃
}
þ1 (𝜃)𝑃(𝑥 = 1) = 𝜃

And

𝑖
[𝑛𝑗 − 𝑛þ𝑗 (𝜃)]2
χ2 = ∑
𝑛þ𝑗 (𝜃)
𝑗=𝑜

[𝑛𝜃 − 𝑛(1 − 𝜃)]2 [𝑛𝑖 − 𝑛𝜃]2


= +
𝑛(1 − 𝜃) 𝑛𝜃

[𝑛 − 𝑛𝑖 − 𝑛(1 − 𝜃)]2 [𝑛𝑖 − 𝑛𝜃 ]2


= +
𝑛(1 − 𝜃) 𝑛𝜃

[𝑛𝑖 − 𝑛𝜃 ]2 1
=
𝑛 𝜃(1 − 𝜃)

By inspection χ2 = 𝜃 for 𝜃 = 𝑛𝑖 /𝑛 Therefore 𝜃̂ = 𝑛𝑖 /𝑛. This is a same as what would be obtained

by the method of moments or method of maximum likelihood

(IV) METHOD OF LEAST SQUARES Suppose γ is e random variable whose value depends

on the value of a (non-random) variable𝑥. For example the weight of a baby (Y) depends on its

age(𝑥) , the temperature (Y) of a place at a given time depends on its altitude (𝑥), or the salary (Y)

of an individual at a given age depends on the number of years (𝑥) of formal education which he has

had the maintenance cost (y) per year of an automobile depends on its age (𝑥) etc.
38

We assume that the distribution of the 𝑟. 𝑣 𝑌 is such that for a given 𝑥, 𝐸(Y/x) is a linear function of

𝑥 while the variance and higher moments of γ are independent of 𝑥. It means that we assume the

liner model

𝐸(Y/𝑥)=𝛼 + 𝛽𝑥

Where 𝑑 and 𝛽 and two parameters .We also write

Y = 𝛼 + 𝛽𝑥 + 𝜖

Where 𝜖 is 𝑎, 𝑟, 𝑢 such that E (𝜖) = 𝜃, 𝑉(𝜃) = 𝜎 2

The problem is to estimate the parameters 𝒹 and 𝛽 on the basic of a random sample of n

observations(γ, xi ), (γ2 , x2 ), … … , (γn , xn )

The method of least squares estimations of α and 𝛽 specifies that we should take as our estimates of

𝒹 and 𝛽 those values that minimise

∑[𝓎𝑖 − α − xi ]2
𝑖=1

Where 𝓎𝑖 is the observed value of yi and xi are the associated values of x. This we minimise the sum

of squares of the residuals when applying the method of least squares.

The least squares estimators of α and 𝛽

𝑛
∑ (𝓎 −𝓎̅)(𝑥 −𝑥̅ )
Are 𝛽̂ = 𝑖=1∑𝑛 𝑖(𝑥 −𝑥̅ )𝑖2
𝑖=1 𝑖

And ̂ = 𝓎̅ − ̂𝛽𝑥̅
α

Remarks:

The least square estimator do not have any optimum properties ever asymptotically However in

linear estimation this method provides good estimation in small simples. These estimators are

minimum variance unbiased estimators among the class of linear function of Y ′ s.


39

TESTING OF HYPOTHESIS

(NEYMAN PEARSON THEORY)

Let x be 𝑎, 𝑟, 𝑢 with þ. 𝑑. 𝑓 𝑓(𝑥, 𝜃) where 𝜃 (may be 𝑎 vector ( 𝜃1 , … … . 𝜃𝑘 ) is an unknown

parameter. A random sample of n observation denoted by 𝐸 = (𝑥1 … … 𝑥𝑘 ) which takes values in

general, in the n-dimensional real space 𝑅𝑛 the parameter space (all possible values of the

parameter is denoted by Ω, say . For any subset AC 𝑅𝑛 we can calculate.

𝑝𝑜 (𝐸 𝐸 𝐴) = ∫ [∏ 𝑓(𝑥, 𝜃)] 𝑑𝑥, … 𝑑𝑥𝑛


𝐴 𝑖=1

Which will depend on 𝜃.

Definition: A statistical hypothesis is a statement about the parameter 𝜃in the form 𝐻: 𝜃 𝜖 ω(< Ω)

For example consider

𝐻: 𝜃 = 𝜃𝑂

𝑜𝑟 𝐻: 𝜃 ⩾ 𝜃0 𝑜𝑟 𝐻: 𝜃 ≠ 𝜃0 𝐻: 𝜃1 < 𝜃 < 𝜃2

Definition If a hypotheses specifics an exact value of the parameter𝜃, it is called a simple hypothesis

𝑒, 𝑔 . 𝐻: 𝜃 = 𝜃0 in this case ω in 𝐻 ∶ 𝜃 𝜖 ω is a set of a single point

If a hypothesis does not fully specify the value of 𝜃( but gives a set of possible values only) it is

called a composite hypothesis 𝑒, 𝑔 𝐻: 𝜃 ≠ 𝜃0 or 𝐻: 𝜃 ⩾ 𝜃0 etc. In this case ω in 𝐻: 𝜃 𝜖 ω is set of

more than one point.

Definition the hypothesis which is being actually tested is called the null hypothesis and other

hypothesis which is stated as the alternative to the null hypothesis is called alternative hypothesis.

For example, null hypothesis may be 𝐻𝑜 : 𝜃 = 𝜃0 and the alternative may be 𝐻𝑖 : 𝜃 ≠ 𝜃0 or 𝐻𝑖 : 𝜃 >

𝜃0 or 𝐻𝑖 : 𝜃 ≤ 𝜃0 etc.

Both null and alternative hypothesis may be simple or composite .For our study, we shall usually

take null hypothesis to be simple .

Suppose we want to test a null hypothesis 𝐻𝑂 against an alternative hypothesis 𝐻1 on the basis of a

random sample 𝐸 = (𝑋1 , . . 𝑋𝑛 ) in the sense that we have to decide when to reject or accept 𝐻𝑂
40

Definition A Statistical test of a (null) hypothesis 𝐻𝑂 against an alternative hypothesis 𝐻1 is a rule

or procedure for deciding when to reject or accept 𝐻𝑂 on the basis of the sample 𝐸 = (𝑋𝐼 , . . 𝑋𝑛 ) .It

̅ = 𝑅𝑛 − 𝑊 such that
specifies a position of the sample space 𝑅𝑛 into two disjoint subsets W and 𝑊

we reject 𝐻𝑂 when 𝐸 𝜖 𝑊and accept 𝐻𝑂 when 𝐸 𝜖 ̅̅̅̅


𝑊 [We note that the rejection of 𝐻𝑂 amounts to

acceptance of 𝐻1 and vice-versa]

Definition The set W, corresponding to a test T, which is that we reject 𝐻𝑂 when 𝐸 𝜖 𝑊 is called the

critical region of the test while ̅̅̅̅


𝑊 is called its acceptance region. For different test the critical

regions are different.

Two types of errors: In a testing problem we are liable to commit two types of error. Suppose 𝐻𝑂 is

true and get 𝐸 𝜖 ̅̅̅̅


𝑊 so that we reject 𝐻𝑂 this is called the Type I error which occasion when we

reject the null hypothesis when it is actually true. On the other hand, suppose 𝐻𝑂 is false and 𝐻𝑖 is

true and yet 𝑥 𝜖 𝑤


̅ so that we accept 𝐻𝑂 this is called the types II errors which occurs when we

accept the null hypothesis when it is actually false. We denote by α and 𝛽 the probability of type I

error and type II error, respectively, 𝑖, 𝑒

𝛼 = 𝑃{𝐻𝑜 /𝐻𝑜 is true}

= 𝑃{𝐸 𝜖 𝑊/ 𝜃 𝜖𝐻𝑜 }

And 𝛽 = 𝑃{𝐴𝑐𝑐𝑒𝑝𝐻𝑜 /𝐻𝑜 is fabe

̅ / 𝜃 𝜖𝐻𝑜 }
= 𝑃{𝐸𝜖𝑊

Definition The probability of type I error for a test T, denoted by ∝ is called the “size” or level of

significance of the test T.

Remark If 𝐻𝑜 is simple (say 𝐻𝑜 :𝜃=𝜃𝑜 ) is clearly defined ,when 𝐻𝑜 is composite (say 𝐻𝑜 : 𝜃 𝜖 𝑊)we

take

𝛼 = 𝑠𝑢𝑝𝑃𝑇 {𝐸 𝜖𝑊/: 𝜃} 𝜃𝜖𝐻𝑜

Definition For a test T having the co region 𝑤2 the power function 𝑃𝑇 () is defind by

𝑃𝑇 (𝜃) = 𝑃{𝑅𝑒𝑗𝑒𝑐𝑡𝐻𝑜 / 𝜃}

= P𝜃 {E 𝜖 𝑊}
41

As a function of 𝜃

Evidently,

𝑃𝑇 (𝜃) = 𝛼 𝑓𝑜𝑟 𝜃 𝜖 𝐻𝑜

𝑃𝑇 (𝜃) = 1 − 𝛽 𝜃 𝜖 𝐻1

If we would find a test of the given hypothesis for which both ∝and 𝛽are minimum it would be the

best. Unfortunately, it is not possible to minimise both error simultaneously for a fixed sample size

test. Consists two tests T and 𝑇2 defined as follows

𝑇𝐼 always rejects HO 𝑖, 𝑒 its critical region W1 =𝑅𝑛 , while 𝑇2 always accepts 𝐻𝑜 𝑖, 𝑒 its cur region

𝑊2 = ∅ then for T1 , ∝= 0 and 𝛽 = 1 this shows that if the probability of type I error becomes

minimum than the probability of type II error becomes maximum and vice-versa what is done is to

fix ∝ , taking ∝ to be quite small (in practical ∝= .05 or .01)so that all test of size ∝ are only

considered. Among all test of a given size∝ comparison made on the basic of their power function. If

T and T are two tests (for the same testing problem) of same size ∝, T is said to be better than T if

its power is greater than the power of T for alteration hypothesis (equivalently the probability of

type II error for T is less then the probability of type II for T,)

Simple hypothesis against a simple alternation: Consider the testing problem

𝐻𝑂 : 𝜃 = 𝜃𝑂

𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃𝑂 )

Definition A test T* is called a most powerful test (MP) of size ∝ (0 <∝< 1) if only if the probability

of type I error is equal to ∝ and its power 𝑃𝑇 (𝜃) is not less than the power 𝑃𝑇 (𝜃) of all other test T

of size ∝, 𝑖, 𝑒

(𝑖)𝑃𝑇∗ (𝜃𝑜 ) =∝

(𝑖𝑖 )𝑃𝑇∗ (𝜃𝑖 ) = 𝑃𝑇 (𝜃𝑖 )

For any other test T of size ∝

[This means that the probability of type II error for T is less that of IV any other test]

Simple hypothesis against a composite alteration: Consider the testing problem


42

𝐻𝑜 : 𝜃 = 𝜃0

𝐻𝑖 : 𝜃 ≠ 𝜃0 (𝑜𝑟𝜃 > 𝜃𝑜 𝑜𝑟𝜃 < 𝜃0 )

Definition: A test T is called a uniformly most powerful test (VMP) of size ∝( 0 <∝< 1)if its

probability type I error is equal to ∝ and its power function is such that

𝑃𝑇𝑥 (𝜃) ⩾ 𝑃𝑇 (𝜃)for all 𝜃𝜖𝐻𝐼 and all other test T of size ∝

Example Let x be 𝑎, 𝑟, 𝑢 having exponential distribution

𝑓 (𝑥, 𝜃) = 𝜃 𝑒 −𝜃𝑥 (𝑥 ⩾ 𝜃)

And we want to test

𝐻0 : 𝜃 = 2
Against }
𝐻1 : 𝜃 = 1

Let the sample consist of only one observation X and consider two tests T and T with associated

regions 𝑊 = {𝑋 ⩾ 𝐼} and 𝑊 = {𝑋 ≤ 0.7} respectively

The probabilities of two error for T are


∝= 𝑃{𝑋 ⩾ 1/𝜃 = 2} = 2 ∫ 𝑒 −2𝑥 𝑑𝑥 = 0.135
1

1
𝛽 = 𝑃{𝑋 ⩾ 1/𝜃 = 1} = ∫ 𝑒 −𝑥 𝑑𝑥 = 0.635
0

The probabilities of two error for T are

7
∝= 𝑃{𝑋 ⩾ 0.7/𝜃 = 2} = 2 ∫ 𝑒 −2𝑥 𝑑𝑥 = 0.135
0


𝛽 = 𝑃{𝑋 ⩾ 0.7/𝜃 = 1} = ∫ 𝑒 −2𝑥 𝑑𝑥 = 0.932
7

Obviously T is better than T’.

Example A Two –faced coin is tossed six times for which the probability of getting head in a toss is 𝜃

and the probability of getting a tail is (1−𝜃) . it is required to test the hypothesis.

𝐻0 : 𝜃 = 𝜃𝑜 = 1⁄2
43

Against 𝐻1 : 𝜃 = 𝜃𝑜 = 2⁄3

If the test consists in rejecting 𝐻0 when head appeases nurse then tours times and accepting 𝐻0

otherwise find 𝑑, 𝛽 soln

∝= 𝑃{= 𝜃𝑜 } = 7⁄26

8
𝛽 = 1 − 𝑃{𝑅𝑒𝑗𝐻0 /𝜃 = 𝜃1 } = 1 − 2 ⁄36

Example Let x have an exponential distribution

1 −𝑥
𝑓(𝑥, 𝜃) = 𝑒 𝜃, 𝑥 ⩾ 𝜃
𝜃

It is required to test 𝐻𝑂 : 𝜃 = 1

𝐻1 : 𝜃 = 4

Find ∝ and 𝛽 for the test having region 𝐶 = {𝑋 > 3}on the basic of a sample observation

Soln : We have ∝= 𝑃{𝑅𝑒𝑗𝐻𝑂 /𝜃=𝜃𝑂 }

= 𝑃{𝑋 > 3/ 𝜃 = 1}


= ∫ 𝑒 −𝑥 𝑑𝑥 = 3𝑒
3

𝛽 = 𝑃{𝑠𝑐𝑒. 𝐻0 /𝜃 = 𝜃0

= 1 − 𝑃{𝑋 > 3/𝜃=4}

1 ∞
=1− 4 ∫3 𝑒 −𝑥/4 𝑑𝑥

= 1 − 𝑒 −3/4

Power = 1 − 𝛽 = 𝑒 −3/4

Example Let x have the rectangular distribution

1
𝑓(𝑥, 𝑜 ) = ,𝑜 ⪕ 𝑥 ⪕ 𝜃
𝜃

It is required to test the hypothesis


44

𝐻0 : 𝜃 = 1

Against 𝐻1 : 𝜃 = 2

Suppose one observation is taken and the tests having the critical regions (a) 𝐶1 = {𝑥 ⩾ .7} and

(b) 𝐶2 = {. 8 ⪕ 𝑥 ⪕ 1.3} obtain the profanities of two types error ∝ and 𝛽

Soln : (a) 𝐶1 = {𝑥 ⩾ .7}

∝= 𝑃{𝑅𝑎𝑗𝐻𝑂 /𝜃=𝜃𝑂 }

𝑃[𝑋 ⩾ .7/𝜃=1]

1
= ∫ 𝑖 𝑑𝑥 = .3
7

𝛽 = 𝑃{sec 𝐻0 /𝜃=𝜃𝑖 }

.7
1
=∫ 𝑑𝑥
0 2

= 35

(b) 𝐶2 = {. 8 ⪕ 𝑥 ⪕ 1.3}

∝= 𝑃{. 8 ⪕ 𝑥 ⪕ 1.3/𝜃 = 1}

1
= ∫ 1. 𝑑𝑥 = 2
.8

1 − 𝛽 = 𝑃{. 8 ⪕ 𝑥 ⪕ 1.3/ 𝜃=2}

1.3
1
=∫ 𝑑𝑥 = .25
.8 2

Or 𝛽 = .75

Example Let x have a Binomial distribution 𝐵(10, þ) for which

10
𝑓 (𝑥, þ) = ( ) þ𝑥 (1 − þ)10−𝑥 , 𝑥 = 0,1, … .10
𝑥

One observation x is taken for testing 𝐻0 ∶ þ = 1⁄2 against 𝐻1 : þ = 1⁄4. Find ∝ 𝑎𝑛𝑑 𝛽 for the test

which rejects 𝐻0 when x≤3.


45

Soln ∝= 𝑃{𝑥 ≤ 3/þ=1⁄2}

3 𝑥
10 1 1 10−𝑥
=∑( )( ) ( )
𝑥 2 2
𝑥=0

11
=
64

𝛽 = 1 − 𝑃{𝑥 ≤ 3 þ=1⁄4}

3 𝑥
10 1 3 10−𝑥
1 − ∑ ( )( ) ( )
𝑥 4 4
𝑥=0

38
= 1 − 31.
49

Example Let x have a Poison distribution 𝑃(𝜆) and it is required to test the hypothesis 𝐻𝑂 : 𝜆 = 1vs

𝐻𝑖 : 𝜆 = 2. One observation is taken and a test is considered which reject 𝐻𝑂 when X⩾3 . Find ∝, 𝛽

Soln: we have ∝= 𝑃 (𝑋 ≥ 3/𝜆 = 1}

𝑒 −1
= 1 − ∑2𝑋=0 𝑥!

1 1 1 5
= 1−[ + + ] = 1−
𝑒 𝑒 2𝑒 2𝑒

𝛽 = 𝑃 (𝑋 ≥ 3/𝜆 = 2}

2
𝑒 −2 2𝑥
=∑
𝑥!
𝑥=0

1
= [1 + 2 + 2]
𝑒2

5
=
𝑒2

Now we are in a positions to power a the over which helps us to obtain MP tests of a sample

hypothesis against a simple alternative. In some special situations, this also gives a UMP test when

the alternation is composite.

Let us suppose that we are testing a simple hypothesis against a simple alternative
46

𝐻𝑜 : 𝜃 = 𝜃𝑜

Us 𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃𝑜 )

Theorem (Neyman- Pearson Lemma)

let the like hood of the sample E=(𝑋1 , … , 𝑋𝑛 ) under 𝐻𝑜 and 𝐻𝑖 be

𝐿(𝜃𝒿 ) = 𝐿(𝜃𝒿 , 𝑋1 , … , 𝑋𝑛 )

= ∏ 𝑓(𝑋𝑖 , 𝜃𝒿 ), 𝒿 = 0,1
𝐿=1

Let T be a test of size∝,for which the cr. region W is defined by

𝐿(𝜃𝐼 )
𝑊 = {𝐸/ ⩾ 𝑒}
𝐿(𝜃0 )

Where e is a constant determined by the size condition

𝑃{𝐸 𝜖𝑊/𝜃0}=∝

Then T is a MP of size ∝for testing 𝐻0 against 𝐻𝐼

Prof Let us write

𝐿0 = 𝐿(𝜃𝑜 )𝑎𝑛𝑑 𝐿𝑖 = 𝐿(𝜃𝑖 )

So that the size and power of any test T with Cr. Regain W are follows:

Size of 𝑇 = ∫𝑊 𝐿0 𝑑𝑥 𝑎𝑛𝑑 𝑝𝑜𝑤𝑒𝑟 𝑜𝑓 𝑇 = ∫𝑤 𝐿𝑖 𝑑𝑥

Where 𝑑𝑥=𝑑𝑥1 𝑑𝑥2 … … 𝑑𝑥𝑛

Consider the test T (having cr. Region w) and other test T (having is Region since both W are of

size ∝ we have w)

∫ 𝐿𝑜 𝑑𝑥 =∝= ∫ 𝐿𝑜 𝑑𝑥 − (1)
𝑤 𝑤
47

W W1 W2 W3

𝑊1 = 𝑊 − 𝑊 ∩ 𝑊
Let 𝑊2 = 𝑊 ∩ 𝑊
𝑊3 = 𝑊 − 𝑊

We have using (i),

∫ 𝐿𝑂 𝑑𝑥 = ∫ 𝐿𝑂 𝑑𝑥 − ∫ 𝐿𝑜 𝑑𝑥
𝑤1 𝑊 𝑊2

= ∫ 𝐿𝑂 𝑑𝑥 − ∫ 𝐿𝑂 𝑑𝑥 = ∫ 𝐿0 𝑑𝑥 − (𝑖𝑖)
𝑤 𝑤2 𝑊3

Sine 𝑊1 ⊂ 𝑊 ∗ and 𝑊3 ⊄ 𝑊 ∗ we have, by definition of 𝑤 𝛾 and using (i)

∫ 𝐿𝑖 𝑑𝑥 ⩾ 𝑐 ∫ 𝑙𝑜 𝑑𝑥 − (𝑖𝑖)
𝑤𝑖 𝑤𝑖

And ∫𝑤 𝐿𝑖 𝑑𝑥 < 𝑐 ∫𝑤 𝐿𝑜 𝑑𝑥 = 𝑐 ∫𝑤 𝐿𝑜 𝑑𝑥 − (𝑖𝑖𝑖)


3 3 1

Therefore ,from (ii) $(iii) we get

∫ 𝐿𝑖 𝑑𝑥 ⩾ ∫ 𝑙𝑖 𝑑𝑥 − (𝑖𝑣)
𝑤1 𝑤3

Adding ∫𝑊 𝐿𝐼 𝑑𝑥 on both sides of (iv) we get


2

∫ 𝐿 𝑖 𝑑𝑥 ⩾ ∫ 𝐿𝑂 𝑑𝑥
𝑤1 ∪𝑤2 𝑤3 ∪𝑤2

Or ∫𝑤 𝐿 𝑖 𝑑𝑥 ∫𝑤 𝐿𝑂 𝑑𝑥

Or 𝑃𝑟 (𝑅𝑒𝑗𝐻𝑜 /𝜃=𝜃𝑖 )

Or 𝑃𝑟 (𝜃𝑖 ) ⩾ 𝑃𝑟 (𝜃𝑖 )

Which shows that T is more powerful then T any other test of size ∝ Hence T is the MP test

Remarks (1) The constant C for the MP test is determined by using the size condition

∫ 𝐿𝑜 𝑑𝑥 =∝
𝑤

Usually, a unique value of C is obtained when the 𝑟. 𝑣 has a continuous distribution.


48

(2) When X is a discrete 𝑟. 𝑣. the constant C may not be unique. What is more important is that we

may not be able to find a MP critical region with expect size ∝. To get rid of the difficultly the cr.

Region is defined by the following

𝐿(𝜃1 )
𝑅𝑒𝑗 𝐻𝑂 𝑖𝑓 >𝑐
𝐿(𝜃2 )
𝐿(𝜃1 )
𝑅𝑒𝑗 𝐻𝑂 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦 𝑟 𝑖𝑓 =𝑐
𝐿(𝜃2 )
𝐿(𝜃1 )
𝐴𝑐𝑐 𝐻𝑂 𝑖𝑓 >𝑐
{ 𝐿(𝜃2 )

𝐿(𝜃1 ) 𝐿(𝜃1)
Then the size of test is 𝑃𝑜 { > 𝑐} + 𝑟𝑃𝑜 { = 𝑐} =∝
𝐿(𝜃𝑜 ) 𝐿(𝜃𝑜 )

To any given ∝, 𝑟 can be determined. Such a test is called the a randomized test

Example Let (𝑥1 , . . 𝑥5 ) be a random sample from HO Bernoulli .distribution

𝑓(𝑥, 𝜃) = 𝜃 𝑥 (1 − 𝜃)1−𝑥 , 𝑥 = 0,1(0 < 𝜃 < 1)

Let us test 𝐻𝑂 𝜃 = 6 us 𝐻1 : 𝜃 = 𝜃1 (> .6). The MP test has cr. Region {∑51 𝑥𝑖 ⩾ 𝑐}

Now ∑51 𝑥𝑖 has Bernoulli. distribution B(5,𝜃)

From the tables of Bernoulli Distribution we can to tabulate 𝑃𝑜 {∑51 𝑥𝑖 ⩾ 𝑐/ 𝜃 = .6} us follows

C P(∑51 𝑥𝑖 ⩾ 𝑐) PO

1 0.01024 1.00000

2 0.23040 0.98976

3 0.34560 0.68256

4 0.25420 0.33696

5 0.07776 0.07776

As such, no non-randomized MP test of exact size∝ .05 or 01 exists. However, the randomized MP

test of size .35 is given by


49

𝑅𝑎𝑗 𝐻𝑂 𝑖𝑓 ∑ 𝑥𝑖 > 3
1=𝑖
5
. 01304
𝑅𝑎𝑗 𝐻𝑂 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑖𝑓 ∑ 𝑥𝑖 = 3
. 34560
1=𝑖
5

𝐴𝑐𝑒 𝐻𝑂 𝑖𝑓 ∑ 𝑥𝑖 = 3
{ 1=𝑖

(3) Suppose we test the simple hypothesis 𝐻𝑂 : 𝜃 > 𝜃𝑜 against a composite alternation 𝐻𝑖 : 𝜃 ≠ 𝜃𝑜 or

𝐻𝑖 : 𝜃 > 𝜃𝑜 or 𝐻𝑖 : 𝜃 < 𝜃𝑜 if the MP test for 𝐻𝑂 : 𝜃 = 𝜃𝑜 a gains 𝐻𝑖 : 𝜃 = 𝜃𝑖 given by the NP lemma dose

not depend on 𝜃𝑖 ,the same test with be MP for all alternative values of 𝜃 and, therefore it will be

a.UMP test.

Example (1) Let x have a Poisson distribution 𝑃(𝜆)having þ. m . 𝑓

𝑒 −𝜆 𝜆𝑥
𝑓(𝑥, 𝜆) = , 𝑥 = 𝑜, 1,2
𝑥𝑖

We want to test 𝐻𝑖 ∶ 𝜆 = 𝜆0

Against 𝐻𝑖 ∶ 𝜆 = 𝜆1

We have

𝑛 𝑛
−𝑛𝜆 ∑𝑛
𝑖 𝑥𝑖
𝐿 (𝑜 ) = ∏ 𝑓(𝑥, 𝜆) = 𝑒 𝜆 / ∏ 𝑥𝑖
𝑖 𝑖

Therefore, the MP test has the cr region W given by

𝐿(𝜆 )
𝑊 = {𝐿(𝜆1) ⩾ 𝐶} , 𝑖, 𝑒. inside W we have
0

𝑛
𝐿(𝜆1 ) 𝜆1
= 𝑒 −𝑛(𝜆1 −𝜆0 ) ( ) ∑ 𝑥𝑖 ⩾𝐶
𝐿(𝜆0 ) 𝜆0
𝑖

𝜆
Or – 𝑛(𝜆1 − 𝜆0 ) + (∑ 𝑥𝑖 )𝑙𝑜𝑔 ( 1 ) ⩾ 𝐶
𝜆0

Or ∑𝑛1 𝑥𝑖 ⩾ 𝑘

𝑐+𝑛(𝜆1−𝜆0 )
Where 𝓀= log (𝜆1 /𝜆0 )

𝑤 = {∑ 𝑥𝑖 ⩾ 𝓀}
𝑖
50

We know that ∑𝑛𝑖 𝑥𝑖 has Poisson distribution 𝑃(𝑛𝜆) so that k can be determined by solving

𝑃(∑𝑛𝑖 𝑥𝑖 ⩾ 𝓀 /λ=𝜆0 ) =∝

Remarks: (i) When 𝜆1 < 𝜆0 the MP test will be given by {∑𝑛𝑖 𝑥𝑖 ⪕ 𝓀}

(ii) Sine the cr region does not depend on the value of 𝜆1 there are UMP for the alternative 𝐻𝑖 : 𝜆 >

𝜆0 as 𝐻𝑖 : 𝜆 < 𝜆0, respectively.

(iii)For getting a MP test for an exact size ∝ we may have to use randomized test

(2) Let X have an exponential distribution

𝑓 (𝑥, 𝑜 ) = 𝜃𝑒 −𝜃𝑥 (𝑥 ⩾ 𝑜)

We want to test 𝐻𝑜 : 𝜃 = 𝜃𝑜

Us 𝐻𝑖 : 𝜃 = 𝜃𝑖 (< 𝜃𝑜 )

𝑛−𝜃 ∑𝑛
𝑖 (𝑥𝑖 −𝜇)
2
We have 𝐿(𝜃) = ∏𝑛𝑖 𝑓 (𝑥, 𝑜 ) = 𝜃𝑒

Therefore, the MP test has the critical region W defined by

𝐿(𝜃 )
𝑊 = {𝐿(𝜃𝑖 ) ⩾ 𝑐}
0

1

𝐿(𝜇𝑖 ) 𝑒 2𝜎2 ∑𝑛
𝑖 (𝑥𝑖 −𝜇𝑖 )
2
𝑖, 𝑒 Inside W 𝐿(𝜇𝑂 )
= −
1 ⩾𝑐
𝑒 2𝜎2 ∑𝑛
𝑖 (𝑥𝑖 −𝜇𝑜 )
2

1

Or 𝑒 2𝜎2 [∑𝑛𝑖(𝑥𝑖 − 𝜇𝑖 )2 − ∑𝑛𝑖(𝑥𝑖 − 𝜇𝑜 )2 ] ⩾ 𝑐

Or [∑𝑛𝑖(𝑥𝑖 − 𝜇𝑜 )2 − ∑𝑛𝑖(𝑥𝑖 − 𝜇𝑖 )2 ] ⩾ 2𝜎 2 log 𝑐

Or 2(𝜇𝑖 − 𝜇𝑜 ) ∑𝑛𝑖 𝑥𝑖 ⩾ 2𝜎 2 log 𝑐 + (𝜇12 − 𝜇02 )𝑛

𝑖 𝜎 2𝑙𝑜𝑔𝑐 𝜇𝑖 +𝜇𝑜 𝑠𝑖𝑛𝑐𝑒


Or ∑𝑛𝑖 𝑥𝑖 ⩾ + ( )
𝑛 𝑛(𝜇𝑖 −𝜇𝑜 ) 2 𝐻, 7𝜇𝑜

Or 𝑥̅ ⩾ 𝓀

Whose 𝓀 = 𝑟, ℎ, 𝑠

: MP test is given by W={𝑥̅ ⩾ 𝓀} Since 𝑥̅ ⩾ 𝑁(𝜇, 𝜎⁄ ) we can determine


√𝑛
51

GRAPH HERE

𝑃[𝑍 ⩾ 𝓀∝ ] =∝

𝓀∝ Is called the upper ∝ % point of N (O,1)

𝓀∝ Is called the lower ∝ % point of N (O,1)

𝓀 by solving

𝑃𝜇𝑜 {𝑥̅ ⩾ 𝓀} =∝

𝑥̅ ⩾𝜇 𝓀−𝜇𝑜
Or 𝑃𝜇𝑜 { 𝜎√𝑛𝑜 ⩾ 𝜎√𝑛
} =∝

𝓀−𝜇𝑜
Or 𝑇𝜇𝑜 {𝑧 ⩾ 𝜎√𝑛
} =∝

Under 𝐻𝑜 , 𝑧 has N (o,1)and the tables of standard normal distribution provider the value of

𝓀−𝜇𝑜 𝜎
𝓀∝ such that 𝓀∝ = or 𝓀 + 𝜇𝑜 + 𝓀∝
𝜎√𝑛 √𝑛

Remark (1) the power of the MP test given above is

𝑃𝜇𝑖 {𝑥̅ ⩾ 𝓀}

𝑥̅ − 𝜇𝑖 𝓀 − 𝜇𝑖
𝑃𝜇𝑖 { ⩾ }
𝜎√𝑛 𝜎√𝑛

√𝑛(𝜇𝑜 − 𝜇𝑖 )
𝑃𝜇𝑖 {𝑧 ⩾ + 𝓀∝ }
𝜎

Since (𝜇𝑜 − 𝜇𝑖 ) < 𝑜 ,it shows that the power is an impressing function of n

𝜎
(ii) If 𝜇𝑖 < 𝜇𝑜 the MP test can be shown to have the critical region {𝑥̅ ⩾ 𝓀} where 𝓀 = 𝜇𝑜 + 𝓀∝
√𝑛

such that 𝑃{𝑍 ⪕ 𝓀∝ } =∝ for a standard normal 𝑟, 𝑣(𝑖𝑛 𝑝𝑎𝑐𝑡 𝓀∝ = −𝓀∝ )

(iii) We observe that the MP test of 𝐻𝑜 : 𝜇 = 𝜇𝑜 us 𝐻𝐼 : 𝜇 = 𝜇𝑖 (> 𝜇𝑜 ) has a cr region which dose not

depend on 𝜇𝑖 the same test will be UMP for testing 𝐻𝑜 : 𝜇 = 𝜇𝑜 against 𝐻𝐼 : 𝜇 > 𝜇𝑜 Similarly the MP

test 𝐻𝑜 : 𝜇 = 𝜇𝑜 against 𝐻𝐼 : 𝜇 = 𝜇𝑖 (> 𝜇𝑜 ) is UMP for testing 𝐻𝑜 : 𝜇 = 𝜇𝑜 against𝐻𝐼 : 𝜇 < 𝜇𝑜

However it can be shown that there is no test which is UMP for𝐻𝑜 : 𝜇 = 𝜇𝑜 against𝐻𝐼 : 𝜇 ≠ 𝜇𝑜

(4) Let X have a normal distribution 𝑁(𝜇, 𝜎) where 𝜇 is a known constant


52

We want to test

𝐻𝑜 ∶ 𝜎 = 𝜎𝑜

Us 𝐻𝑖 ∶ 𝜎 = 𝜎𝑖 (> 𝜎𝑜 )

We have

1 1
− 2 ∑𝑛 (𝑥 −𝜇)2
𝐿(𝜎) = 𝑒 2𝜎 𝑖 𝑖
(2𝜋)𝑛/2 𝜎 𝑛

𝐿(𝜎 )
Therefore the MP test has the cr region w depend by 𝑊 = {𝐿(𝜎 𝑖 ) ⩾ 𝑐}
𝑜

𝑖, 𝑒 inside W

𝐿(𝜎𝑖 ) 𝜎𝑜 𝑛 𝑛 2 1 1
= ( ) 𝑒 − ∑𝑖 (𝑥𝑖−𝜇) ( 𝜎2 − 𝜎2 ) ⩾ 𝑐
𝐿(𝜎𝑜 ) 𝜎𝑖 21 20

1 1 𝜎 𝑛
Or ∑𝑛𝑖(𝑥𝑖 − 𝜇)2 ( − 2𝜎2 ) ⩾ log 𝑐 (𝜎 𝑖 )
2𝜎2
1 0 𝑜

Or ∑(𝑥𝑖 − 𝜇)2 ⩾ 𝓀(𝑠𝑖𝑛𝑐𝑒 𝜎1 > 𝜎𝑜 )

𝜎
2{log 𝑒+𝑛𝑙𝑜𝑔( 𝑖 )]
𝜎𝑜
Where 𝓀= 1 1
( 2 − 2)
𝜎𝑜 𝜎1

MP test cr region is given by 𝑊 = {∑𝑛𝑖(𝑥𝑖 − 𝜇)2 ⩾ 𝓀}

(𝑥𝑖−𝜇)2
Since ∑𝑛𝑖 𝜎2
~𝑥𝑛2 we can determine 𝓀by solving

𝑃𝜎𝑜 {∑(𝑥𝑖 − 𝜇)2 ⩾ 𝓀} =∝


𝑖

(𝑥𝑖−𝜇)2 𝓀
Or 𝑃𝜎𝑜 {∑𝑛𝑖 𝜎2
⩾ 𝜎 2} =∝
0

𝓀
Or 𝑃𝜎𝑜 {𝛾 ⩾ } =∝
𝜎02

Where 𝑌~𝑥𝑛2

From the table of 𝑥𝑛2 we can find 𝓀∝ such that 𝑃{𝛾 ⩾ 𝓀∝ } =∝ so that 𝓀 = 𝜎02 𝓀∝

Remark (i) the power if the test is given by


53

𝑃𝜎𝑜 {∑(𝑥𝑖 − 𝜇)2 ⩾ 𝓀}


𝑖

∑(𝑥𝑖 − 𝜇)2 𝓀
= 𝑃𝜎1 { 2 ⩾ 2}
𝜎1 𝜎1

𝜎02
= 𝑃𝜎1 {𝑌 ⩾ 𝓀 }
𝜎12 ∝

Where 𝑌~𝑥𝑛2

(ii)If 𝜎1 < 𝜎0the MP test can be shown to have the cr region {∑𝑛𝑖(𝑥𝑖 − 𝜇)2 ≤ 𝓀′}

(iii)Since the MP test of 𝐻0 : 𝜎 = 𝜎𝑜 us 𝐻𝑖 : 𝜎 = 𝜎1 (> 𝜎𝑜 ) dose not depend on 𝜎𝑖 it is UMP for testing

𝐻0 : 𝜎 = 𝜎𝑜 against 𝐻𝑖 : 𝜎 > 𝜎𝑜 Similarity the MP test for 𝐻0 : 𝜎 = 𝜎𝑜 against 𝐻𝑖 : 𝜎 > 𝜎1 (> 𝜎𝑜 )is UMP

test for 𝐻0 : 𝜎 = 𝜎𝑜 against𝐻𝑖 : 𝜎 < 𝜎𝑜

However, no UMP test exists for alternative 𝐻1 : 𝜎 ≠ 𝜎𝑜

(5) Let X have the distribution with þ, 𝑑, 𝑓

𝑓(𝑥, 𝑜 ) = 𝜃𝑥 𝜃−1 (0 ≤ 𝑥 ≤ 1)

We want to test

𝐻0 ∶ 𝜃 = 𝜃0

Against 𝐻𝑖 ∶ 𝜃 = 𝜃1 (> 𝜃0 )

We have 1(𝜃) = 𝜃 𝑛 [∏ 𝑥𝑖 ]𝜃−1

𝐿(𝜃 )
Therefore, the MP has the cr region W={𝐿(𝜃𝑖 ) ⩾ 𝐶} 𝑖, 𝑒 𝑖𝑛𝑠𝑖𝑑𝑒 𝑊
0

𝑛
𝜃𝑖 𝑛
( ) [∏ 𝑥𝑖 ]𝜃𝑖 −𝜃𝑜 ⩾ 𝑐
𝜃0
𝑖=1

𝜃
Or∏𝑛𝑖=1 𝑥𝑖 ⩾ 𝓀where 𝓀 = [𝑐 ( 𝜃𝑜 ) 𝑛] 1
/𝜃𝑖 − 𝜃0 )
𝑖

The MP test has cr region

{∏ 𝑥𝑖 ⩾ 𝓀}
𝑖=𝑖
54

Or {– ∑𝑛𝑖=𝑖 𝑙𝑜𝑔𝑥𝑖 ≤ 𝓀𝑜 } where 𝓀𝑜 = − log 𝓀

If can be shown that 𝛾 = (20)(∑𝑛𝑖=𝑖 𝑙𝑜𝑔𝑥𝑖 ) has 𝑥2𝑛


2
therefore the constant 𝓀𝑜 (and have 𝓀) can. Be

determined by solving

𝑃{𝛾 ⪕ (2𝜃𝑂 )𝓀𝑂 } =∝

2
Where 𝛾~𝑥2𝑛

Remark In the same manner for 𝐻𝑜 ∶ 𝜃 = 𝜃𝑂 against 𝐻𝑖 : 𝜃 = 𝜃1 (< 𝜃𝑂 ) MP test can be found.

𝑓, (𝑥)
{𝑥 ⩾ 𝑐}
𝑓𝑜 , (𝑥)

2
2 𝑒 𝑥 /2
Or √ ⩾𝑎
𝑥 1+𝑥 2

Since L.H.S is non decreasing for |x| the cr region is {|𝑥| ⩾ 𝓀}

Where 𝓀 is dreaming from the size condition

𝑃𝐻𝑂 {|𝑥| ⩾ 𝓀} =∝

Since X~N (O,1) Under 𝐻𝑂 , 𝓀 = 𝑍∝ /2

(7) Suppose X has the following distribution under 𝐻𝑂 and 𝐻𝑖 will here the critical region

𝜋 𝑥²⁄
{ x : √ 2 𝑒 |𝑥|+ 2 ≥𝐶}

f1(x)
Since f0(x) is a non-decreasing function of |𝑥|, the critical region is { |𝑥| ≥ k} where k= 𝑧𝛼⁄2

(8) Suppose x has the following distribution

1 𝑥²⁄
H0 : f0(x) = 𝑒 2 ; -∞< x<+∞
√2𝜋

2 4
H1 : f1(x) = 1 𝑒 −𝑋 ; -∞< x<+∞
Г
4

Let us take a single observation. The MP test of H 0 Vs H1 has the critical region
f1(x)
{ x: ≥C}
f0(x)

4 +𝑥²⁄
Or 𝑒 −𝑥 2 ≥C’

Since L.H.S. is a non-increasing function of |𝑥|, the critical region is {|𝑥| ≤ k} where k = 𝑧(1−𝛼)⁄
2

(9) Suppose X has the following distribution

4𝑥 ; 0 < 𝑥 < 1/2


H0: f0(x) = {
4 1 − 𝑥); 1/2 ≤ 𝑥 < 1
(
55

H1: f1(x) = 1 ; 0<x<1

Let us take a single observation. The MP test of H0 VS H1has the critical region given by
f1(x)
f0(x)
≥C

1
f1(x) 4𝑥
; 0 < 𝑥 < 1/2
Where ={ 1
f0(x) ; 1/2 ≤ 𝑥 < 1
4(1−𝑥)

f1(x)
We see that ≥C
f0(x)

If either x <k1 or x>k2

Hence MP or region is

{ x <k1 }U{ x>k2 }

The size of the test is 𝑃H0 { x <k1}U{ x>k2} + 𝑃H0 {x>k2} = α

For simplicity we can take k₂ = 1-k1

(10) Let X have the rectangular distribution R(0,ϴ) having p.d.f.


1
f(x,ϴ) = 𝛳 ; 0≤x≤ϴ

We want to test

H0: ϴ = ϴ0 Vs

H1:ϴ = ϴ1(>ϴ0)

We have
1
L(ϴ) = 𝛳𝑛 I[0,X(n)](X(1))I[0,ϴ](x(n))

𝐿(ϴ0)
Therefore the MP test has the critical region W ={ 𝐿(ϴ1) ≥C}

Now,

𝐿(ϴ0) ϴ0 𝐼[0,𝛳1] (𝑥(𝑛))


=( )𝑛
𝐿(ϴ1) ϴ1 𝐼[0,𝛳0] (𝑥(𝑛))
ϴ0
( ) 𝑛 𝑓𝑜𝑟 0 ≤ 𝑥(𝑛) ≤ 𝛳0
= { ϴ1
∞ 𝑓𝑜𝑟 𝛳0 ≤ 𝑥(𝑛) ≤ 𝛳1

𝐿(ϴ0)
This shows that 𝐿(ϴ1) is an increasing function of 𝑥(𝑛) and, therefore

𝐿(ϴ0)
𝐿(ϴ1)
≥C 𝑥(𝑛) ≥k

Hence the MP test has the critical region

{ 𝑥(𝑛) ≥k}

The value of k is determined by the size condition

P { 𝑥(𝑛) ≥k/𝛳0 } = α

𝑛𝑦 𝑛−1
Since 𝑥(𝑛) has p.d.f. f𝑥(𝑛) (Y)= 𝛳𝑛
; 0≤y≤ϴ

𝑛 𝛳0 𝑛−1
We have ∫ 𝑦
𝛳𝑛 𝑘
𝑑𝑦 =α

Remark: the above test is UMP for H0: ϴ=ϴ0 against H1:ϴ>ϴ0
56

As we have remarked, UMP test may not always exist. Therefore we for their restrict the class of
tests by considering unbiased tests (defined below) and then try to obtain UMP test in the class of
unbiased tests. If such a test exists we call it uniformly not powerful unbiased test (UMPU test)

Definition Suppose we are testing a sample hypothesis Hϴ: 𝜃 = 𝜃0 against a conqurite alternative

𝐻𝑖 (𝑚𝑎𝑦 𝑏𝑒 𝜃 ≠ 𝜃0 𝑜𝑟 𝜃 > 𝜃0 𝑜𝑟 𝜃 < 𝜃0 ) A test T is called unbiased if

𝑃𝑜 (𝑇) ⩾∝ for all 𝜃 𝜖𝐻𝑖

Where ∝ is the size of T 𝑖, 𝑒. 𝑃𝑜 (𝑇) =∝

Remark: Suppose 𝜃 = 𝜃1 is one of the alternative value of 𝜃. If the test is not unbiased it may
happen that 𝑃𝑜 (𝑇) <∝= 𝑃00 (𝑇) which means that the probability of rejecting 𝐻𝑜 when it is false is
less then the probability if rejecting 𝐻𝑜 when it is true if the test is unbiased it will not happen.

Theorem A MP test or UMP test is unbiased.

Prof Let T be a MP (or UMP) test of size ∝. Consider another test T which rejects the null hypothesis
HO: 𝜃 = 𝜃0 with probability ∝ irrespective of the sample outcome. We may just toss a coin for
which the probability of is ∝ and decide to reject the null hypothesis Hϴ if we get ∝ , irrespective if
the sample values obtained. Then

𝑃𝑇 {𝑅𝑒𝑗𝑒𝑐𝑡𝐻𝑜 /HO is 𝑡𝑟𝑢𝑒} =∝

So that the size of the test T=∝. Also the power of test T is also∝, since

𝑃𝑇 {𝑅𝑒𝑗𝑒𝑐𝑡𝐻𝑜 /HO is 𝑓𝑎𝑙𝑠𝑒 } =∝

But T being MP (or UMP) is such that

𝑃𝑇 (𝜃) ⩾ 𝑃𝑇 (𝜃) for 𝜃 𝜖 Hi

Or 𝑃𝑇 (𝜃) ⩾∝ for 𝜃 ≠ 𝜃0

Remark: It may be shown that the following tests are UMPU for two sided alternative 𝐻𝑖 ∶ 𝜃 ≠ 𝜃0 in
example 1,2 and 3

For example 1, UMPU test is {𝑥̅ ⩾ 𝓀1 𝑜𝑟𝑥̅ ⪕ 𝓀2 }

For example 2, UMPU test is{[𝑥] ⩾ 𝓀}

For example 3, UMPU test is{∑(𝑥𝑖 − 𝜇)2 ⩾ 𝓀1 𝑜𝑟 ∑(𝑥𝑖 − 𝜇)2 ⪕ 𝓀2 }

The constant 𝓀, 𝓀1 , 𝓀2 are determined from size condition

Now we consider a produce for constructing tests that has some intuitive appeal and that .
Frequently, though not always, leads to UMP or UMPU test. Also the produce leads to test that have
decided large sample properties

Suppose we are given a sample (𝑥1 , … , 𝑥𝑛 ) from a distribution with þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃 ) (where 𝜃 may be
a vector) and we deice to test the null hypothesis 𝐻𝑜 ∶ 𝜃 𝜖 𝑤(⊂ 𝛺) against the alternative
hypothesis 𝐻𝑖 ∶ 𝜃 𝜖 𝑤(⊂ 𝛺) where 𝛺 is the parameter space,

The likelihood function of the sample is given by


𝑛

𝐿(𝜃) = 𝑙 (𝜃, 𝑥1 , … , 𝑥𝑛 ) = ∏ 𝑓(𝑥𝑖 , 𝜃)


𝑖=1

Define the likelihood ratio


57

max 𝐿(𝜃)
𝜆 = 𝜃 𝜀𝜔
max𝐿(𝜃)
𝜃 𝜀𝛺
max 𝐿(𝜃)
Where denotes the maximum of the likelihood function when 𝜃 is restricted to values in
𝜃 𝜀𝜔
w and max 𝐿(𝜃) denotes the maximum of the likelihood for when 𝜃 takes all possible values in𝛺

Obviously, 0 ≤ 𝜆 ≤ 1 and λ is also to 1 of the sample shows that 𝜃 lies actually in 𝛚.

Definition The likelihood ratio test of 𝐻𝑜 against 𝐻𝑖 has the critical region

𝑤 = {𝜆 ⪕ 𝜆𝑜 }

When 𝜆𝑜 is determined by the size condition

Sup
𝑃{𝜆 ⪕ 𝜆𝑜 /𝜃𝜖𝐻𝑜 } =∝
𝜃𝜖𝐻𝑜

Remark (1) For testing a simple hypothesis against a simple alternative likelihood ratio test is
equivalent to the test given by the Neyman –Pearson lemma.

(ii) if a sufficient statistics exists the L.R test is a function of the sufficient statistics.

(iii) Under some regularity condition -2 loge λ is asymptotically distributed as a χ2 𝑟. 𝑣. with


degrees of freedom equal to the difference between the number in 𝛚.

Example: (1) Let X be a r.v. having a normal distribution 𝑁(𝜇, 𝜎) where 𝜎 (=𝜎𝑜 ) is known

We want to test 𝐻𝑂 : 𝜇 = 𝜇𝑜

Against 𝐻1 : 𝜇 ≠ 𝜇𝑜

We have the likelihood function


1 𝑛 2
𝐿 (𝜇) = 𝑒 − ∑𝑖=1(𝑥𝑖−𝜇) /2𝜎𝑜2
(𝜎𝑂 √2𝜋)𝑛

Then
1 𝑛 2 /2𝜎 2
max 𝐿(𝜇) = 𝑛 𝑒 − ∑𝑖 (𝑥𝑖−𝜇0 ) 0
𝐻0 ( 𝜎0√2𝜋)

Since MLE of μ is 𝜇̂ = 𝑥̅ , therefore


1 𝑛 2 /2𝜎 2
max 𝐿(𝜇) = 𝑛 𝑒 − ∑𝑖 (𝑥𝑖−𝑥̅ ) 0
𝜇 ( 𝜎0√2𝜋)

The LR test critical region is given by λ ≤ λ0


max 𝐿(𝜇)
𝐻0
≤ 𝜆0
max 𝐿(𝜇)
𝜇

2
∑𝑛 2
𝑒 − 𝑖 (𝑥𝑖−𝜇0) /2𝜎0
Or 2 ≤ 𝜆0
− ∑𝑛 ̅) /2𝜎2
𝑖 (𝑥𝑖 −𝑥 0
𝑒

1 2 −∑(𝑥
2 [∑(𝑥𝑖 −𝑥̅ ) 𝑖 −𝜇0 )²
𝑒 2𝜎0 ≤ 𝜆0

–𝑛(𝑥̅ −𝜇0 )²
Or 2𝜎02
≤ log 𝜆0

𝑛(𝑥̅ −𝜇0 )²
or 𝜎02
≥𝑘

|(𝑥̅ −𝜇0 )|
or 𝜎0 ≥k’

√𝑛
58

Remark (i) the above test is not UMP test since there exists other UMP tests for 𝐻1 : 𝜇 > 𝜇0 and
√𝑛(𝑥̅ −𝜇𝑜)
𝐻𝑖 : 𝜇 < 𝜇0 (II) 𝜎𝑜
~𝑁(0, 𝐼) under 𝐻𝑂 so that 𝑘 can 𝑘 found easily by using size condition

(2) Let x ~𝑁(𝑂, 𝐼) where both 𝜇 and 𝜎 are unknown we want to test

𝐻𝑂 : 𝜇 = 𝜇𝑜

Against 𝐻𝑖 : 𝜇 ≠ 𝜇0

We have the likelihood for

1 −
1 𝑛
∑ (𝑋 −𝜇)2
𝐿(𝜇, 𝜎) = 𝑒 2𝜎 2 𝑖 𝐼
(𝜎√2𝜋)

Under 𝐻𝑂 : 𝜇 = 𝜇𝑜 , (given) so the MLE of 𝜎 is

𝑛
(𝑥𝑖 − 𝜇𝑜)2
𝜎̂ == √∑
𝑛
𝑖

In general, 𝑚, 𝑙, 𝑒 of 𝜇 is 𝜇̂ = 𝑥̅ and MLEof 𝜎 is

𝑛
(𝑥𝑖 − 𝑥̅ )2
𝜎̂ = 𝑠0 = √∑
𝑛
𝑖

𝑖 2
Therefore, we have max 𝐿(𝜇, 𝜎) = 𝑒 − ∑(𝑥𝑖−𝜇𝑜) /2&2𝑜
(&𝑜 √2𝜋)𝑛

𝑖 𝑛
= 𝑒−2
(&𝑜 √2𝜋)𝑛

max 𝐿(𝜇, 𝜎) 𝐼 2
And = 𝑒 − ∑(𝑥𝑖−𝑥̅ ) /2&2
𝜇, 𝜎 (&√2𝜋)𝑛

𝐼 𝑛⁄
= 𝑒− 2
(&√2𝜋)𝑛

The L.R test critical region is given by

max 𝐿(𝜇, 𝜎)
𝐻𝑜
𝜆= ⪕ 𝜆𝑂
max 𝐿(𝜇, 𝜎)
𝐻𝑜

& 𝑛
Or ( )
&𝑜

&2
Or &2𝑜
⪕ 𝜆′𝑂

𝑛&2𝑜
Or ⩾𝓀
𝑠2

Since &2𝑜 = 𝑛(𝑥̅ − 𝜇𝑜)2 + 𝑛&2 the above cr region becomes

𝑛(𝑥̅ − 𝜇𝑜)2
⩾ 𝓀′
𝑠2
√𝑛[(𝑥̅ −𝜇𝑜)
Or 𝑠2
⩾ 𝓀’’

∑(𝑥𝑖−𝑥̅ )2 𝑛𝑠 2
Where 𝑠= 𝑛−𝑖
= 𝑛−𝑖 ’
59

√𝑛(𝑥̅ −𝜇𝑜 )
It is know that &
has t distribution on (𝑛 − 1)𝑑. 𝑓 under 𝐻𝑂 There fore the values of 𝓀 can be
found from the size condition

𝑃{|𝑌| ⩾ 𝓀} =∝

Where Y~𝑡𝑛−𝑖

(3) Let X ~N(𝜇, 𝜎) when both 𝜇 and 𝜎 are unknown we want to test

𝐻𝑂 : 𝜎 = 𝜎𝑂

Against
𝐻𝑖 : 𝜎 ≠ 𝜎0

We have the likelihood function

1 1
− 2 ∑𝑛 (𝑥 −𝜇)2
𝐿(𝜇, 𝜎) = 𝑒 2𝜎 𝑖 𝑖
(𝜎√2𝜋)𝑛

Under𝐻𝑂 , the 𝑚, 𝑙, 𝑒 of 𝜇 is 𝜇̂ = 𝑥̅

In general, 𝑚, 𝑙, 𝑒 of 𝜇 is 𝜇̂ = 𝑥̅ and 𝑚, 𝑙, 𝑒 of 𝜎 is

∑(𝑥𝑖 − 𝑥̅ )2
𝜎̂ = 𝑠 = √
𝑛

Then we have

𝑚𝑎𝑥𝐿(𝜇, 𝜎) 1 − ∑(𝑥𝑖−𝑥̅ )2
= 𝑛𝑒 /2𝜎𝑜2
𝐻𝑜 (𝜎𝑜√2𝜋)

𝑛 𝑠2
1 −
2 𝜎𝑠2
= 𝑛 𝑒
(𝜎𝑜 √2𝜋)

𝑚𝑎𝑥𝐿(𝜇, 𝜎) 1 − ∑(𝑥𝑖 −𝑥̅ )2


And = 𝑛𝑒 /2𝑠 2
𝐻𝑜 (𝜎𝑜√2𝜋)

1 𝑛
= 𝑛 𝑒−2
(&√2𝜋)

𝑚𝑎𝑥𝐿(𝜇,𝜎)
𝐻𝑜
L.R test cr region is given by 𝜆= 𝑚𝑎𝑥𝐿(𝜇,𝜎) <𝜆𝑜
𝜇,𝜎

𝑛 𝑠𝑜
𝑠𝑜 ( −𝐼)
2 𝜎2
Or ( 2) 𝑜 < 𝜆𝑜
𝜎𝑜

𝑛 𝑛
𝑠2
Or 𝓎 2 𝑒 − 2 (𝓎−𝑖) < 𝜆𝑜 𝑤ℎ𝑒𝑟𝑒 𝓎 = 𝜎 2
𝑜

𝑛 𝑛
We note that 𝓎 2 𝑒 −2 (𝓎−𝑖) has a maximum at 𝓎 = 1

Therefore 𝜆 < 𝜆𝑜 if and only if 𝓎 ⩾ 𝓀2 or 𝓎 ⩾ 𝓀1 that is the critical region is

𝑠2 𝑠2
{ ⪕ 𝓀 2 𝑜𝑟 ⪕ 𝓀1 }
𝜎𝑜2 𝜎𝑜2

(𝑛)𝑠 2 (𝑛)𝑠 2
{ ⩾ 𝓀 2 𝑜𝑟 ⪕ 𝓀1 }
𝜎𝑜2 𝜎𝑜2

(𝑛)𝑠 2 ∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
2
But it is know that 𝜎𝑜2
= 𝜎𝑜2
has 𝑥 2 distribution on (n-i) 𝑑, 𝑓 using the 𝑥𝑛−𝑖 tables and size
condition we can get the values of 𝓀1 and 𝓀2
60

(3a) suppose in example 3 the value of 𝜇(= 𝜇𝑜 )is know. Then the L.R cr region because

𝑛𝑠𝑜2 𝑛𝑠𝑜2
{ ⩾ 𝑐1 𝑜𝑟 ⩾ 𝑐2 }
𝜎𝑜2 𝜎𝑜2

Where 𝑠𝑜2 = ∑𝑛𝑖(𝑥𝑖 − 𝜇𝑜)2 /n

𝑛𝑠𝑜2 ∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
In than case 𝜎𝑜2
= 𝜎2
has𝑥𝑛2

(4)Let x have an exponential distribution

1 −𝑥
𝑓 (𝑥, 𝑜 ) = 𝑒 𝜃 (𝑥 ⩾ 𝜃)
𝜃

We want to test 𝐻𝑜 ∶ 𝜃 = 𝜃0

Against 𝐻𝑖 ∶ 𝜃 = 𝜃𝑜

We have the likelihood function

1 − 1 ∑ 𝑥𝑖
𝐿 (𝜃) = 𝑒 𝜃
𝜃𝑛
1 −𝑛𝑥̅
= 𝑒 𝜃
𝜃𝑛

Then we get

𝑛𝑥̅
1 − 𝑓𝑜𝑟𝑥̅ >𝜃𝑜
𝑛 𝑒 𝜃𝑜
𝑚𝑎𝑥𝐿(𝜃) (𝜃𝑜 )
=
𝐻𝑜 𝑖 −𝑛 𝑓𝑜𝑟𝑥̅ ⪕𝜃𝑜
𝑛𝑒
{ )
(𝑥̅

𝑚𝑎𝑥𝐿(𝜃) 𝑖
Also = (𝑥̅ )𝑛 𝑒 −𝑛
𝜃

Because 𝑚, 𝑙, 𝑒 of 𝜃 is 𝜃̂ = 𝑥̅

The LR test cr region is given by 𝑥 ⪕ 𝜆𝑜

Where

𝑖 𝑖
∑ 𝑥𝑖𝑥̅ >𝜃𝑜
𝑒 −𝜃
(𝜃𝑜 )𝑛
𝑥=
𝑖
𝑒 −𝑛 𝑓𝑜𝑟 𝑥̅ ⪕𝜃𝑜
{ (𝑥̅ )𝑛
𝑥̅
Since 𝓎𝑛 𝑒 −𝑛(𝓎−𝑖) at lains maximum at 𝓎 − 𝑖 taking 𝓎 = 𝜃 we see that λ=i if 𝓎 = 𝑖 and λ⪕𝜆𝑜 for
𝑜
𝓎 ⩾ 𝓀(𝑜 < 𝓀 < 𝑖)

LR test critical region because

𝑥̅
{ ⩾ 𝓀} 𝑜𝑟{𝑥̅ ⩾ 𝓀}
𝜃𝑜

Remark (i) if one take 𝐻𝑖 ∶ 𝜃 = 𝜃𝑜 we shall get the L.R critical region as{𝑥̅ ⩾ 𝓀} in both case of one
–sided alternation the L.R test are UMP test.

(2) Since ∑𝑛𝑖 𝑥𝑖 has gamma distribution we can find the value of 𝓀 by using size condition

(5) Let (𝑥𝑖 , . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from 𝑁(𝜇, 𝜎𝑖 ) and (𝛾𝑖 , 𝛾𝑛2 ) be 𝑎, 𝑟, 𝑠 from another 𝑁(𝜇2 , , 𝜎2 ) where two
samples (distribution) are independent.
61

We want to test

𝐻𝑜 : 𝜇1 = 𝜇2
}
𝐻𝑖 : 𝜇1 ≠ 𝜇2

Where it is assumed that 𝜎1 = 𝜎2 (= 𝜎𝑢𝑛𝑘𝑜𝑤𝑛 ) we that the like hood function


1
1 − [∑𝑛(𝑥𝑖−𝜇𝑖 )2+∑𝑛 2
𝑖 (𝓎𝑖 −𝜇𝑖 ) ]
1(𝜇1 , 𝜇2 , 𝜎) = 𝑛𝑖𝑡ℎ 𝑒
2𝜎2 𝑖
(√2𝜋)𝑛𝑖𝑡ℎ 𝜎

In general the 𝑚, 𝑙, 𝑒 of 𝜇1 , 𝜇2 and 𝜎 are


𝑛 𝑛
𝑖 𝑖
𝜇̂𝑖 = 𝑥̅ = ∑ 𝑥𝑖 , 𝜇
̂2 = 𝓎̅ = ∑ 𝓎𝑖
𝑛1 𝑛2
𝑖 𝑖

𝑛 𝑠 +𝑛 𝑠 2 2
And 𝜎̂𝑜 = 1𝑛1 +𝑛2 2 = 𝑠 2 (𝑠𝑎𝑦)
1 2

𝐼 𝐼
Also 𝑠12 = 𝑛 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 and𝑠22 = 𝑛 (𝓎𝑖 − 𝓎̅)2
1 1

Therefore

𝑚𝑎𝑥𝐿(𝜇1 , 𝜇2 , 𝜎) 1 (𝑛1 +𝑛2


= 𝑒− 2
𝜇1 , 𝜇2 𝜎 𝑛 +𝑛 2 𝑛 +𝑛
(2𝜋) 1 2 (𝑠 ) 1 2

Against the 𝑚, 𝑙, 𝑒 under 𝐻𝑂 are

𝑛1 𝑥̅ + 𝑛2 𝓎̅
𝜇
̂1 = 𝜇
̂2 = = 𝑚(𝑠𝑎𝑦)
𝑛1 + 𝑛2
1
And 𝜎̂2 = [∑𝑛𝑖(𝑥𝑖 − 𝑚)2 + ∑𝑛𝑖(𝑥𝑖 − 𝑚)2
𝑛1 +𝑛2

𝑛1 𝑛2
1
= [∑ {(𝑥𝑖 − 𝑥̅ ) + (𝑥̅ − 𝑚 )}2 + ∑{(𝓎𝑖 − 𝓎̅ ) + (𝓎̅ − 𝑚)}2 ]
𝑛1 + 𝑛2
𝑖 𝑖

𝑛1 𝑛2
1
= [∑(𝑥𝑖 − 𝑥̅ )2 + 𝑛1 (𝑥̅ − 𝑚)2 + ∑(𝓎𝑗 − 𝓎̅)2 + 𝑛2 (𝓎̅ − 𝑚)}2 ]
𝑛1 + 𝑛2
𝑖 𝑖

𝑛1 𝑛2
1 𝑛1 𝑛2
= [∑(𝑥𝑖 − 𝑥̅ )2 + ∑(𝓎𝑖 − 𝓎̅)2 + (𝑥̅ − 𝓎̅)2 ]
𝑛1 + 𝑛2 𝑛1 𝑛2
𝑖 𝑖

𝑛1 𝑛2
= 𝑠𝑜 + (𝑥̅ − 𝓎̅)2 = 𝑠𝑜2 (𝑠𝑎𝑦)
(𝑛1 + 𝑛2

𝑚𝑎𝑥𝐿(𝜇1 , 𝜇2 , 𝜎) 𝐼 𝑛1 +𝑛2
Therefore = (√2𝜋)𝑛1+𝑛2 (𝑠2)𝑛1 +𝑛2 𝑒 − 2
𝐻𝑜 𝑜

So that the LR cr region is given by

𝑠𝑜2 𝑛1 +𝑛2
𝜆 = ( 2) ⪕ 𝜆𝑜
𝑠𝑜

𝑠𝑜2
Or 𝑠𝑜2
⪕𝓀

(𝑥̅ −𝓎)2
Or 2 𝑖 𝑖
(𝑛1 +𝑛2 )𝑠 ( + )
𝑛1 𝑛2

(𝑥̅ −𝓎)2
Or 𝑖 𝑖
𝑠2 ( + )
𝑛1 𝑛2

𝑛1 𝑠12 +𝑛2 𝑠22 𝑛1 +𝑛2


Where 𝑠2 = 𝑛1 +𝑛2 −2
= (𝑛 𝑠2
1 +𝑛2 −2)
62

The cr region can be within as

[𝑥̅ −𝓎̅]
{ 𝑖 𝑖
⩾ 𝓀}
𝑠𝑜 √ +
𝑛1 𝑛2

Since under find 𝓀 such that 𝑃{𝛾 ⩾ 𝓀} =∝

Where 𝛾~𝑡𝑛1 +𝑛2 −2

(6)Let (𝑋𝐼 , . 𝑋𝑛𝐼 ) be 𝑎, 𝑟, 𝑠 from N (𝜇, 𝜎𝑖 ) and(𝛾1 , . 𝛾𝑛2 )𝑛 N (𝜇2 , 𝜎2 ) where two samples (and two
distributions) are indecent

We want to test

𝐻𝑂 : 𝜎1 = 𝜎2
Against }
𝐻𝑖 ∶ 𝜎1 ≠ 𝜎2

We have the likelihood function


1 𝑛 2 2 𝑛2
1 ∑ (𝑥𝑖−𝜇𝑖) ∑𝑖 (𝓎𝑖 −𝜇𝑖)
𝜎 1 − [ 𝑖 2 + 2 ]
2
1(𝜇1 , . 𝜇2 𝜎 2 ) = 𝑛1 𝜎 𝑛2 𝑒 𝜎1 𝜎2
(2𝜋)𝑛1 +𝑛2 𝜎1 2

In general, be 𝑚, 𝑙, 𝑒 of 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 are
𝑛1 𝑛2
1 1
𝜇
̂1 = 𝑥̅ , 𝜇 ̂1 = ∑(𝑥𝑖 − 𝑥̅ )2 , 𝜎
̂2 = 𝓎̅, 𝜎 ̂2 = ∑(𝓎𝑖 − 𝓎̅)2
𝑛1 𝑛2
𝑖 𝑖

So that = 𝑠12 (𝑠𝑎𝑦) = 𝑠22 (𝑠𝑎𝑦)

1 𝑛 +𝑛
− 1 2
max 𝐿(𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 ) = 𝑛 𝑒 2
(2𝜋)𝑛1 +𝑛2 (𝑠12 ) 2 (𝑠 2 )𝑛2 /2
2

Against, the m, l, e under 𝐻𝑂 are


𝑛1 𝑛2
1
𝜇
̂1 = 𝑥̅ , 𝜇
̂2 = 𝓎̅, 𝜎
̂1 = 𝜎
̂2 = 𝜎̂ = [∑(𝑥𝑖 − 𝑥̅ )2 + ∑(𝓎𝑖 − 𝓎̅)2 ]
𝑛1 + 𝑛2
𝑖 𝑖

𝑛1 𝑠12 + 𝑛2 𝑠22
= = 𝑠 2 (𝑠𝑎𝑦)
𝑛1 + 𝑛2
𝑛1 +𝑛2
1
So that max 𝐿(𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 ) = 𝑛1 +𝑛2 𝑒− 2
(2𝜋)𝑛1+𝑛2 (𝑠 2 ) 2

Therefore, the LR cr region is given by


𝑛1 𝑛2
(𝑠12 ) 2 (𝑠22 ) 2
𝜆= 𝑛1 +𝑛2 ⪕ 𝜆𝑜
(𝑠 2 ) 2

𝑛1 𝑛2
𝑠12 ) 2 (𝑠22) 2
Or 2
𝑛 𝑠 +𝑛 𝑠 2 𝑛1 +𝑛2 ⪕ 𝜆𝑜
( 1 1 2 2) 2
𝑛1 +𝑛2

𝑛1
(𝑛 −1)
[ 1 𝑓] 2
(𝑛2 −1)
Or (𝑛 −1) 𝑛 +𝑛 ⪕ 𝜆𝑜
[1+ 1 𝑓] 1 2
(𝑛2 −1) 2

𝑛 𝑠2 𝑛 𝑠2
Where 𝑓 = (𝑛 1−1)
1
/ (𝑛 1−1)
2
1 2

Setting 𝑔(𝑓)𝑓𝑜𝑟the 𝐿. 𝐻. 𝑆 of (i) we have 𝓰(o)=o and 𝓰(f)→o∞. Furthermore 𝓰(f) attains its
𝑓 𝑛 (𝑛 −1)
maximum for = 1 2 , it is impressing between o and f may and derision in (f may, ∞).
𝑚𝑎𝑦 𝑛2 (𝑛1 −1)
63

Therefore 𝓰(f)⪕ 𝜆𝑜 if and only if 𝑓 < 𝓀1 or 𝑓 > the LR cr region can be within as {𝐹 < 𝓀1 𝑜𝑟 𝐹 >
𝓀2 }

𝑛 𝑠 2 /(𝑛 −1)
Where 𝐹 = 𝑛1 𝑠12 /(𝑛1 −1)
2 2 2

But under𝐻𝑜 , 𝐹~𝐹𝑛1−1,𝑛2 −1 ,

Hence 𝓀1 , 𝓀2 can be obtained from the size condition 𝑃{𝑓 > 𝓀1 𝑜𝑟 𝐹 < 𝓀2 } =∝whese F~𝐹𝑛1−1,𝑛2−1

Some distribution:𝑿𝟐 , 𝒕 𝒂𝒏𝒅 𝑭


Definition: A 𝑟, 𝑣. 𝑥 is said to have a Gamma distribution 𝐺(∝, 𝛽) of its þ. 𝑑. 𝑓. is given by

𝛽∝
𝑓 (𝑥) = Γ(∝) 𝑥 ∝−1 𝑒 −𝛽∝ ;x≥0

=0 ;𝑥 < 0

(∝> 0, 𝛽 > 0)
𝑡
We have 𝑚, 𝑔, 𝑓 𝑀𝑥 (𝑡) = (1 − 𝛽)−∝ , 𝑡 <𝛽

𝐸(𝑋) =∝/𝛽

𝑉(𝑋 ) =∝/𝛽2

If ∝= 1 we get the exponential distribution

𝑓 (𝑥) = 𝛽𝑒 −𝛽𝑥 , 𝑥 ⩾ 0(𝛽 > 0)

𝐸(𝑋) = 1/𝛽

𝑉(𝑋 ) = 1/𝛽2

If ∝= 𝑛/2(𝑛 a positive integer) 𝛽 = 1/2 we get the 𝑥 2 distribution on 𝑛, 𝑑, 𝑓 where þ, 𝑑, 𝑓 is

1 𝑛 𝑥
𝑓(𝑥) = 𝑛 𝑥 2 −𝑖 𝑒 −2 , 𝑥 ⩾ 𝑜
22 𝐼(𝑛⁄ 2)

We have 𝑚, 𝑔, 𝑓 𝑀𝑥 (𝑡) = (1 − 2𝑡)−𝑛/2

𝐸(𝑥) = 𝑛
}
𝑣(𝑥) = 2𝑛

Definition: A 𝑟, 𝑣 X is said to have a 𝑡 −distribution on 𝑛, 𝑑, 𝑓 if its þ, 𝑑, 𝑓 is given by

Γ (𝑛 +
2
1
) 𝑥 2 −𝑛+1
𝑓 (𝑥) = (1 + ) 2 , −∞ < 𝑥 < ∞
Γ (𝑛2) √𝑛𝑥 𝑛

𝛾
If X~𝑛(𝑜 − 𝑖 ), 𝛾~x 2 (n) and x and 𝛾 are inept then 𝑇 = 𝑋/√ ⁄𝑛 has 𝑡(𝑛)

Define: A 𝑟, 𝑣X is said to have a 𝐹 − distribution on(𝑚, 𝑛)𝑑, 𝑓 if its þ, 𝑑, 𝑓 is given by

Γ (𝑚 2+ 𝑛) 𝑚 𝑚 𝑚
2 −1
𝑓(𝑥) = ( ) 2 ,𝑥 ⩾ 0
Γ (𝑚2
𝑛
)Γ( ) 𝑛
2 (𝐼 +
𝑚 𝑚+𝑛
𝑥) 2
𝑛

=0 , x<0
64

𝑥⁄
𝑚
Of 𝑥~𝑥 2 (𝑚 ) and γ~𝑥 2 (𝑛) where 𝑥 and γ are independent𝑧 = 𝛾⁄ has 𝐹(𝑚, 𝑛)
𝑛

Percentage points the upper∝ − percent point of the 𝑥 2 (𝑛) distribution is 𝑥 2 𝑛, ∝ where

𝑃(𝑥 2 (𝑛) > 𝑥 2 𝑛, ∝) =∝

The upper∝ − percent point of the𝑡(𝑛) distribution𝑖𝑠, 𝑡𝑛, ∝ where

𝑃(𝑡(𝑛) > 𝑡𝑛, ∝) =∝

Since t-distribution is symmetrical

𝑃 ([𝑡(𝑛) ] > 𝑡𝑛,∝⁄2 ) =∝

The upper∝ − percent point of the F(m ,n, ∝) distribution is Fm ,n, ∝ where

𝑃 (𝐹 (𝑚, 𝑛) > 𝐹, 𝑚. 𝑛, ∝) =∝
𝑖
Note that 𝐹𝑚, 𝑛, 𝑖−∝=
𝐹𝑛,𝑛∝

Use of 𝒙𝟐 𝒕 and 𝒕̅ distribution in testing problem

Use of 𝒙𝟐distribution (i) Testing the variance of a of a distribution: Given a sample (𝑥𝑖 , … 𝑥𝑛 ) of
size n from a normal distribution 𝑁(𝜇, 𝜎) where 𝜎 is unknown, we would like to test 𝐻𝑜 : 𝜎 = 𝜎𝑜
against alternative 𝜎>𝜎𝑜 or 𝜎<𝜎𝑜 or 𝜎 ≠ 𝜎𝑜 the tests are summarised n follows

Case I 𝜇 know

Alternative reject 𝑯𝒐 at level ∝if

HO:𝜎 > 𝜎𝑂 ∑𝑛𝑖(𝑥𝑖 − 𝜇)2 /𝜎𝑜2 ⩾ 𝑥 2 𝑛, ∝

Ho: 𝜎 < 𝜎𝑂 " " ⪕ 𝑥 2 𝑛, ∝

2
" " ⪕ 𝑥 2 𝑛, ∝ 𝑥𝑛−𝑖,−∝/2
Ho: 𝜎 ≠ 𝜎𝑂 2 }
𝑜𝑟 ⩾ 𝑥𝑛−𝑖,−∝/2

Case II 𝜇 know

Alternative reject 𝑯𝒐 at level ∝if

HO:𝜎 > 𝜎𝑂 (𝑛 − 𝑖 )(𝑠)2 ⩾ 𝑥 2 𝑛 − 𝑖 ∝,

Ho: 𝜎 < 𝜎𝑂 " " ⪕ 𝑥 2 𝑛, ∝

2
" " ⪕ 𝑥 2 𝑛, ∝ 𝑥𝑛−𝑖,−∝/2
Ho: 𝜎 ≠ 𝜎𝑂 2 }
𝑜𝑟 ⩾ 𝑥𝑛−𝑖,−∝/2

1
Where (𝑠)2 = 𝑛−1 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2

(2) Testing proportions in 𝓀(>2) classes Suppose 𝑎, 𝑟, 𝑣 takes values in one of 𝓀(>2)mutually
exclusive classes AI,......A𝓀 with þ = 𝑃 (𝑥 𝜖 AI ), 1,2, … . . 𝓀, ∑𝓀
I þI = I we want to test the hypotheses

that

𝐻𝑜 : þ𝑖 = þ𝑜𝑖 (𝑖 = 1, … . 𝓀)

Against 𝐻𝑖 : þ𝑖 ≠ þ𝑜𝑖 for all


65

For a random (x, … 𝑥𝑛 ) of n observation let the observed frequencies in the 𝓀 classes be 𝑜1 , 𝑜2,
𝑜𝑛 (∑𝑛𝑖 𝑜𝑖 = 𝑛)and the expected frequencies under the 𝐻𝑜 be 𝑒1 , 𝑒2 , … … . 𝑒𝓀 (∑𝑛𝑖 𝑒𝑖 = 𝑛) where 𝑒𝑖 =
𝑛þ𝑖 calculate

𝓀
(𝑜𝑖 − 𝑒𝑖 )2
𝛘𝟐 = ∑
𝑒𝑖
𝑖

Them, for large sample, 𝑥 2 has 𝑥 2 (𝓀 − 𝑖) the test of 𝐻𝑜 has the cr. region

2
𝑥 2 ⩾ 𝑥𝓀−𝑖,𝓀

1
Note: it we want to test 𝐻𝑜 þ1 = þ2 , … … . = þ𝑛 we take þ𝑜𝑖 = to any
𝓀

(3) Testing goodness of fit: given a sample (𝑥1 , . . 𝑥𝑛 ) of Observation on 𝑎. 𝑟. 𝑣 X arranged in the
form of a frequencies distribution having 𝓀 classes AI , … . . A𝓀 we would like to test the hypothesis
that distribution of X has a specified from with þ, 𝑑, 𝑓(𝑜𝑟 þ, 𝑚, 𝑓)𝑓𝑜 (𝑥, 𝜃) the parameter 𝜃 be a
simple one or a vector (𝑜𝑖 , … . . 𝜃ℯ )

Let the observed frequencies in the 𝓀 classes be 𝑜1 , 𝑜2 … . , 𝑜𝓀 , ∑𝓀


𝑖 𝑜𝑖 = 𝑛 and the expected

frequencies under 𝐻𝑜 be 𝑒𝑖 , 𝑒2 , … . 𝑒𝑛 ∑𝓀
𝑖 𝑜𝑖 = 𝑛

Such that 𝑒𝑖 = 𝑃𝐻𝑜 (𝑥 𝜖 𝐴𝑖 ) Calculate

𝓀 𝓀
(𝑜𝑖 − 𝑒𝑖))2 𝑜𝑖2
𝛘𝟐 = ∑ = ∑ −𝑛
𝑒𝑖 𝑒𝑖
𝑖 𝑖

Then, for large sample, 𝑥 2 has 𝑥 2 (𝓀 − 𝑖) the test of 𝐻𝑜 has the cr. Region

2
𝑥 2 ⩾ 𝑥𝓀−𝑖,∝

Note if 𝑟(𝑜𝑓 ℓ) parameters in 𝜃 are estimated from the sample then χ2 has χ2(𝓀 − 𝑟 − 𝑖)if any
expected frequency is lass then 5 we pool this class with the adjoining class and denote by𝓀 the
effective number ƪ classes after paroling

(4) Testing independence of two attributes in a 𝓀xℓ contingency table

In a (𝓀xℓ) contingency table for two attributes, we want to test

𝐻𝑜 : Two attributes are independent

Against 𝐻𝑜 : Two attributes are not independent

Let 𝑂𝑖𝑗 = observed frequency in the (𝑖, 𝒿) the cell

And 𝑒𝑖𝑗 = expected a=(𝑖𝑡ℎ 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝑗𝑡ℎ 𝑐𝑜𝑙𝑢𝑚 𝑡𝑜𝑡𝑎𝑙)n “ “ “under 𝐻𝑜

Calculate

(𝑜𝑖𝑗 −𝑒𝑖𝑗)2
χ2 = ∑𝓀 𝓀
𝑖=1 ∑𝑖=1 𝑒𝑖𝑗
66

𝓀 𝓀 2
𝑜𝑖𝑗
= ∑∑ −𝑛
𝑒𝑖𝑗
𝑖=1 𝑖=1

Where n=total frequency. Then 𝑥 2 has 𝑥 2 on(𝓀 − 𝑖 )𝑥 (ℓ − 𝑖 )𝑑. 𝑓 the test of 𝐻𝑜 has the cr. Region

2
𝑥 2 ⩾ 𝑥𝓀−𝑖,ℓ−𝑖,∝

(5) Testing the homogeneity of 𝓴(> 𝟐) correlation coefficients.

Suppose 𝑟1 , … . 𝑟𝑘 are 𝓀 sample correlation coefficients corresponding to 𝓀 normal

Distribution with population correlation coefficients 𝑝𝑖 , … 𝑝𝓀 we want to test

𝐻𝑂 ∶ 𝑝1 , … 𝑝𝓀 =, . . 𝑝𝓀

Us 𝐻𝐼 : all correlation coefficients are not equal we use the friskers z-trans function of correlation
1 𝑖+𝑟 1 𝑖+𝑝
coefficients given by 𝑧 = 2 𝑙𝑜𝑔ℯ 𝑖+𝑟 , 𝑆 = 2 𝑙𝑜𝑔ℯ 𝑖−𝑝 so that

1
𝑧~𝑁 (𝑆, )
√𝑛 − 3

Where n is the sample size.

We calculate z1, z2,......z𝓀 corresponding to r1,r2,.......r𝓀 having sample size n1,n2,....n𝓀 and define

𝓀 𝓀

𝑧̅ = ∑(𝑛𝑖 − 3)𝑧𝑖 /(∑(𝑛𝑖 − 3)


𝑖 𝑖

And 𝑥 2 = ∑𝓀
𝑖 (𝑛𝑖 − 3)(𝑧𝑖 − 𝑧̅)
2

Then χ2has χ2on (𝓀 − 𝑖 )𝑑. 𝑓 and the test of 𝐻𝑜 has cr. Region

χ2 ⩾ χ2(𝑛−𝑖),∝

Remark: if 𝐻𝑜 is accepted we may obtain an estimate of the common corresponding coefficients


𝜌∗ (𝑠𝑎𝑦) by solving

1 1 + 𝜌∗
𝑧̅ = 𝑙𝑜𝑔𝑒
2 1 − 𝜌∗

Uses if t-distribution:

(i)Testing the mean of a single population: let (𝑥1 , … … . . 𝑥𝑛 ) be a sample of size n from a normal
population 𝑁(𝜇, 𝜎 2 ) and, as usual, 𝑥̅ and 𝑠 2 are the sample mean and sample variance. We would
like to let the null hypothesis 𝐻𝑜 : 𝜇 = 𝜇𝑜 against alterative 𝜇 > 𝜇𝑜 or 𝜇 < 𝜇𝑜 or 𝜇 ≠ 𝜇𝑜 the tests are
summarised as follows:

(2) Testing the equality of two population means: let (𝑥1 , … . . 𝑥𝑛2 ) and (𝓎1 , … … 𝓎𝑛2 ) be two
samples from in dept normal populations 𝑁(𝜇1 , 𝜎1 ) and 𝑁(𝜇2 , 𝜎2 ) respectively let 𝑥̅ , 𝓎̅, 𝑠12 , 𝑠22 be as
usual and let
67

(𝑛1 − 1)(𝑠1 ) 2 + (𝑛2 − 1)𝑠22


(𝑠)2 =
𝑛1 + 𝑛2 − 2

Be the pooled variance.

We would like to test 𝐻𝑂 : 𝜇1 = 𝜇2 against alternative 𝜇1 < 𝜇2 or 𝜇1 ≠ 𝜇2 the test are summarised
as follows:

Case I

Alternative Reject 𝐻𝑂 at level ∝ 𝑖𝑓

𝑥̅ −𝓎̅
𝐻𝑖 : 𝜇1 > 𝜇2 ⩾ 𝑧∝
𝜎 2𝜎 2
√{ 1+ 2}
𝑛1 𝑛2

𝐻𝑖 : 𝜇1 < 𝜇2 “ ⪕ −𝑧∝

[𝑥̅ −𝓎̅]
𝐻𝑖 : 𝜇1 ≠ 𝜇2 ⩾ 𝑧∝/2
𝜎 𝜎 2 2
√{ 1+ 2 }
𝑛1 𝑛2

Case II 𝜎1 , 𝜎2 unknown (𝜎1 = 𝜎2 ) essential corruption

Alternative Reject 𝐻𝑂 at level ∝ 𝑖𝑓

Remark: if we want to test 𝐻𝑂 = 𝜇1 − 𝜇2 = (≠ 𝑜 )𝑤𝑒 use the statistics

(𝑥̅ − 𝓎̅ ) − (𝛿)
1 1
&√𝑛 + 𝑛
1 2

Uses of F-distribution:

(1)Testing equality of two population variances:

Let two samples of sizes 𝑛1 and 𝑛2 be given from two independent normal population 𝑁(𝜇1 , 𝜎1 ) and
𝑁(𝜇2 , 𝜎2 ), respectively .Let 𝑠12 , 𝑠 22 be the two sample variance. We would like to test the null
hypothesis 𝐻𝑜 : 𝜎1 = 𝜎2 against𝐻𝑖 : 𝜎1 ≠ 𝜎2 The test are cr follows:

Case I 𝜇1 , 𝜇2 known

𝑛
∑ 1 (𝑥𝑖−𝜇1)2 𝑛1
Reject 𝐻𝑜 if either ∑𝑛𝑖=1 ⩾ 𝐹
2 (𝓎 −𝜇2)2
𝑖=1 𝑖
𝑛2 𝑛1 ,𝑛2 ,∝/2

𝑛2 (𝓎 −𝜇2)2
∑𝑖=1 𝑖 𝑛
Or 𝑛2 (𝑥 −𝜇1)2
∑𝑖=1
⩾ 𝑛2 𝐹𝑛1 ,𝑛2 ,∝/2
𝑖 1

Case II I 𝜇1 , 𝜇2 known
68

(𝑠1)2
Reject 𝐻𝑜 if either (𝑠2)1
⩾ 𝐹𝑛1 −1,𝑛2 −1,∝/2 If 𝑠1 > 𝑠2
(𝑠1)2
Or ⩾ 𝐹𝑛2 −1,𝑛1 −1,∝/2 If 𝑠2 > 𝑠1
(𝑠2)1

(2) Testing the multiple correlation coefficient: Given a sample of size or from a bivariate
normal population (𝑥1 , 𝑥2 , 𝑥3 ) with multiple correlation coefficient 𝑅1(23) of 𝑥1 𝑜𝑟(𝑥2 , 𝑥3 ) we would
like to test the null hypotheses 𝐻𝑂 𝑅1(23) = 0 let the sample multiple correlation coefficient be
𝑅1(23) . The test is to reject 𝐻𝑂 at level ∝ if

2
𝑟(23) 𝑛−3
2 . ⩾ 𝐹2,𝑛−3,∝
1 − 𝑟1(23) 2

(3) Testing the equality of means of 𝓀 normal distribution (𝓴 > 𝟐)[see left page]

Farceur’s z-transformation of correlation coefficient: Suppose a sample of size n is drawn from


a bivariate population with correlation coefficient the variables Fisher intruded the transformation

1 1+𝑟
𝑧 = 2 𝑙𝑜𝑔𝑒 1−𝑟

Where r is a sample correlation coefficient Though the population correlation coefficient P may be
widely different from zero, the new statistics z may be amounted to be normally distributed even
when n is as small as 10 it has hen show that z has approximate mean

1 1+𝑝
𝜉 = 𝑙𝑜𝑔𝑒
2 1−𝑝

And approximate mean1⁄(𝑛 − 3), 𝑖. 𝑒

√𝑛 − 3(𝑧 − 𝜉)~𝑁(𝑜, 1)

(I)For testing 𝐻𝑜 ∶ 𝑃 = 𝑃𝑜 against 𝐻𝑖 ∶ 𝑃 ≠ 𝑃𝑜 we reject 𝐻𝑜 if

√𝑛 − 3[𝑧 − 𝜉𝑜 ] ⩾ 𝑁∝/2

1 1+𝑝
Where 𝜉𝑜 = 𝑙𝑜𝑔𝑒 and 𝑁∝ is the appear ∝ % point of normal distribution 𝑁(𝑂, 1)
2 1−𝑝

(ii)For testing 𝐻𝑜 ∶ 𝑝1 = 𝑝2 against 𝐻𝑖 ∶ 𝑝1 ≠ 𝑝2 involving two populations, let 𝑟1 , 𝑟2 be the sample


correlation coefficient for two independent sample of size 𝑛1 , 𝑛2 from the two populations and let
𝑧1, 𝑧2 be there transformed values,𝑖, 𝑒

1 1 + 𝑟𝑖
𝑧𝑖 = 𝑙𝑜𝑔𝑒 (𝑖 = 1,2)
2 1 − 𝑟𝑖

The test is to reject 𝐻𝑜 at level ∝ if

|𝑧1 − 𝑧2 |
⩾ 𝑁∝/2
1 1
√ +𝑛
𝑛1−3 2−3
69

(iii)Let 𝑟1 , 𝑟2 … . . 𝑟𝓀 be sample correlation coefficient for 𝓀 sample of sizes 𝑛1 , 𝑛2 … 𝑛𝓀 drown from


𝓀 independent vicariate normal population with correlation coefficients 𝑝1 𝑝2 … . . 𝑝𝓀 . Let 𝑧1 , 𝑧𝓀 be
the transformed values and let

∑𝓀
𝑖=1(𝑛𝑖 − 3)𝑧𝑖
𝑧̅ =
∑𝓀
𝑖=1(𝑛𝑖 − 3)

The test is to reject 𝐻𝑜 at level ∝ if

𝓀
2
∑(𝑛𝑖 − 3)(𝑧𝑖 − 𝑧̅)2 ⩾ 𝑥𝓀−1,∝
𝑖=1

If 𝐻𝑜 is accepted an estimate of common correlation coefficient p is 𝑝𝑟 where𝑧̅ is the transformed


values of 𝑝∗ (x) For large sample

√𝑃(𝐼 − 𝑃
þ~𝑁 (𝑝 )
𝑛

Large sample tests so for we have considered tests of hypothesis which contain assumptions
regarding the population are satisfied .Now we consider some approximate test which are valid only
for sufficiently large samples, but they have wide applicability and hold for all populations satisfying
certain general conditions rather than being valid for some particular populations only (e.g. normal )

(i)Testing a proportion: Suppose in a population is the proportion of members with a qualitative


character A. Let p be the proportion of members with A in a random sample of size n. we would like to
test the hypothesis H0: P=P0 .The test is to reject H0 at level α if

[þ − 𝑃𝑜 ]
⩾ 𝑁∝/2
√𝑝𝑜 (1 − 𝑝𝑜 )/𝑛

(ii)Testing the equality of two population proportions: Let 𝑝1 , 𝑝2 be two population proportions and
þ1 , þ2 be the two sample proportions dream from there indecent population the test of 𝐻𝑜 : , 𝑃1 , 𝑃2 is to
reject 𝐻𝑜 at level ∝if

[þ1 − þ2 ]
≥ 𝑁∝/2
1 1
√þ(𝑖 − þ) { + }
𝑛1 𝑛2
Where

𝑛1 þ1 + 𝑛2 𝑛2
þ=
𝑛1 + 𝑛2

(iii)Testing for a st. deviation: let s be the st. Deviation of a sample of observation of size drown from a
population with st. Deviation 𝜎(x) the test of 𝐻𝑂 : 𝜎 = 𝜎𝑜 is to reject 𝐻𝑂 at level ∝if

[𝓈 − 𝜎𝑜 ]
⩾ 𝑁∝/2
𝜎𝑜 /√2𝑛

(iv)Testing for equality of two population st. Deviation Let 𝓈1 , 𝓈2 be the st. Deviation of two sample of
sprees 𝑛1 , 𝑛2 from two independent population with st. Deviation 𝜎1 , 𝜎2 Let

𝑛1 𝓈12 + 𝑛2 𝓈22
𝓈2 =
𝑛1 + 𝑛2
70

The test of𝐻𝑜 : 𝜎1 , 𝜎2 is to reject the at level ∝if

[𝓈1 − 𝓈2 ]
⩾ 𝑁∝/2
1 1
𝓈√2𝑛 + 2𝑛
1 2

Definition:- For a random sample (𝑥1 , … , 𝑥𝑛 ) from the distribution of a 𝑟. 𝑣. 𝑥 having þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃) Let
𝐿1 𝐿1 (𝑥1 , … , 𝑥𝑛 )and 𝐿2 (𝑥1 , … , 𝑥𝑛 ) be two statistics such that 𝐿1 ≤ 𝐿2 . The interval [𝐿1 , 𝐿2 ] is a
confidence interval for 𝜃 with. Confidence coefficient 1−∝ (0 <∝< 1) if 𝑃𝜃 [𝐿1 ≤ 𝜃 ≤ 𝐿2 ] = 1−∝ for
all 𝜃 𝜖 𝛺 𝐿1 and 𝐿2 are called the lower and upper confidence limits, respectively at least one of them
should not be a constant.

Interval Estimation

Estimation of a parameter by a sample value is known as point estimation. An alternation produce is to


give an interval within which the parameter may be supposed to lie with high probability. This is called
interval estimation and the interval is called the confidence for the parameter

Suppose 𝑎, 𝑟, 𝑣 x has Normal distribution 𝑁(𝜇, 𝜎) with unknown mean 𝜇 and known st. Deviation𝜎. Let
(𝑥𝑖 , … , 𝑥𝑛 ) be the values of a random sample of size or from then distribution .We know that the sample
𝜎 √𝑛(𝑥−𝜇)
mean 𝑥̅ ~𝑁 (𝜇, ) and, hence
𝜎
~𝑁(𝑜, 𝑖). It follows that
√𝑛

√𝑛(𝑥 − 𝜇)
𝑃 {−1.96 ⪕ ⪕ 1.96} = 0.95
𝜎

Or, equivalently,

𝜎
𝑃 {𝑋̅ − 1.96 ⪕ 𝜇 ⪕ 𝑋̅ + 1.96 } = 0.95
√𝑛

This shows that, in respected sampling the probability is 0.95 that the interval

𝜎 𝜎
{𝑋̅ − 1.96 ; 𝑋̅ + 1.96 }
√𝑛 √𝑛

Will include 𝜇, We say that above is a confidence interval for 𝜇 with confidence coefficient ,95. The
two end points are known as 95% confidence limits for𝜇.

Let us now consider the general problem Let 𝑎, 𝑟, 𝑣 x has distribution depending on an unknown
parameter 𝜃 which is to be estimated. Suppose Z is a statistics (usually it is a function of a sufficient
statistics if it exists) which is a function of 𝜃 but whose distribution does not depend on𝜃. Such a
statistics z is called a ploetal function Let 𝜆1 and 𝜆2 be two numbers such that

𝑃{𝜆1 ≤ 𝑍 ≤ 𝜆2 } = 1−∝ − (1)

For a specified ∝ (𝑜 <∝< 1)

The above inequality can be solved such that it assumes the from

𝑃{𝜃1 ((𝑥1 , … , 𝑥𝑛 )) ≤ 𝜃 ≤ 𝜃2 (𝜆1 , . . 𝜆2 )} = 1−∝

For all 𝜃 where 𝜃1 and 𝜃2 are random variables which do not depend on𝜃.Finally, if we astute the sample
value [𝜃1 ((𝑥1 , … , 𝑥𝑛 )), 𝜃2 ((𝑥1 , … , 𝑥𝑛 ))] becomes a confidence interval for 𝜃 with desired confidence
coefficient 1−∝.

Remark: the numbers 𝜆1 , 𝜆2 may be chosen in several ways, giving rise to several confidence intervals.
We usually choose confidence intervals of shortest length.
71

Example (i) 𝑋~𝑁(𝜇, 𝜎) where 𝜎 is Known and 𝜇 is to be estimated

√𝑛(𝑥̅ − 𝜇)
𝐿𝑒𝑡 𝑧 =
𝜎
𝛼
Which has 𝑁(𝑂, 𝐼) distribution For a specified∝ let 𝑁∝/2 be the 2 % critical value of 𝑁(𝑜, 1)then

√𝑛(𝑥̅ − 𝜇)
𝑃 {−𝑁∝/2 ⪕ ⪕ 𝑁∝/2 } = 𝐼−∝
𝜎

𝜎 𝜎
Or 𝑃 {𝑥̅ − 𝑁∝/2 ⪕ 𝜇 ⪕ 𝑥̅ + 𝑁∝/2 } = 𝑖−∝
√𝑛 √𝑛

𝜎 𝜎
So that 𝑃 {𝑥̅ − 𝑁∝/2 𝑥̅ + 𝑁∝/2 }
√𝑛 √𝑛

Isa confidence interval of 𝜇 with confidence coefficient (𝑖−∝)

(2) . 𝑥~𝑁(𝜇, 𝜎), 𝜎 unknown and 𝜇 to be estimated

√𝑛(𝑥̅ −𝜇) 𝑖
𝐿𝑒𝑡 𝑧 = 𝑠
where 𝑠 2 = 𝑖−1 ∑𝑛𝑖(𝑥 − 𝑥̅ )2

Then z has t(n-i) distribution , so that for a specified ∝,

√𝑛(𝑥̅ − 𝜇)
𝑃 {𝑡𝑛−1,∝/2 ⪕ ⪕ 𝑡𝑛−1∝/2 } = 𝑖−∝
𝑠

𝑆 𝑆
Or 𝑃 {𝑋̅ − 𝑡𝑛−1,∝/2 ⪕ 𝜇̅ ⪕ 𝑋̅ + 𝑡𝑛−1,∝/2 } = 𝑖−∝
√𝑛 √𝑛

𝑆 𝑆
So that {𝑋̅ − 𝑡𝑛−1,∝/2 , 𝑋̅ + 𝑡𝑛−1,∝/2 }
√𝑛 √𝑛

Is a confidence interval of 𝜇 with confidence coefficient (1−∝)

(3) 𝑥~𝑁(𝜇, 𝜎), 𝜇 known and 𝜎 is to be estimated

𝐿𝑒𝑡 𝑧 = ∑(𝑥1 − 𝜇)2


𝑖

Then z has 𝑥 2 (𝑛) distribution, so that for a specified ∝

2
∑(𝑥𝑖 − 𝜇)2 2
𝑃 {𝑋𝑛,𝑖−∝/2 ⪕ ⪕ 𝑋𝑛,𝑖−∝/2 } = 1−∝
𝜎2

∑(𝑥 −𝜇)2 ∑(𝑥𝑖−𝜇)2


Or 𝑃 { 𝑋2 𝑖 ⪕ 𝜎2 ⪕ 2
𝑋𝑛,1−∝/2
} = 1−∝
𝑛,1−∝/2

There, 𝑎(1−∝)% confidence interval of 𝜎 2

∑(𝑥𝑖 − 𝜇)2 ∑(𝑥𝑖 − 𝜇)2


{ 2 , 2 }
𝑋𝑛,1−∝/2 𝑋𝑛,1−∝/2

(4) 𝑥~𝑁(𝜇, 𝜎), 𝜇 Unknown and 𝜎 is to be estimated

(𝑛−1)𝑠 2 1
𝐿𝑒𝑡 𝑧 = 𝜎2
Where 𝑠 2 = 𝑛=1 ∑𝑛𝑖(𝑥1 − 𝑥̅ )2

Then z has 𝑥 2 (𝑛) distribution, such that


72

2 (𝑛 − 1)𝑠 2 2
𝑃 {𝑋𝑛,𝑖−∝/2 ⪕ ⪕ 𝑋𝑛,𝑖−∝/2 } = 1−∝
𝜎2

(𝑛−1)𝑠 2 (𝑛−1)𝑠 2
Or 𝑃{ 2 ⪕ 𝜎2 ⪕ 2 } = 1−∝
𝑋𝑛,𝑖−∝/2 𝑋𝑛,𝑖−∝/2

Therefore, a (𝑖−∝)% confidence interval of 𝜎 2 is

(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
{ 2 , 2 }
𝑋𝑛,𝑖−∝/2 𝑋𝑛,𝑖−∝/2

(5) Let x have an exponential distribution with parameter λ which is to be estimated

𝐿𝑒𝑡 𝑧 = 2𝜆𝑛𝑥̅

Then Z has 𝑥 2 (2𝑛) ditribution, so that for a specified ∝

2 2
𝑃{𝑋2𝑛,1−∝/2 ⪕ 2𝜆𝑛𝑥̅ ⪕ 𝑋2𝑛,1−∝/2 } = 𝑖−∝

2 2
𝑋2𝑛,1−∝/2 𝑋2𝑛,,∝/2
Or 𝑃{ 2𝑛𝑥̅
⪕, ⪕ 2𝑛𝑥̅
}

Therefore, a (𝑖−∝)% confidence interval of λ is

2 2
𝑋2𝑛,1−∝/2 𝑋2𝑛,,∝/2
{ ⪕, ⪕ }
2𝑛𝑥̅ 2𝑛𝑥̅

(6) Let X ~N(𝜇, 𝜎) and γ ~N(𝜇2 , 𝜎2 )where 𝜎1 = 𝜎2 (𝑢𝑛𝑘𝑜𝑤𝑛 ). We want a confidence for (𝜇1 − 𝜇2 )

(𝑥̅ − 𝑦̅) − (𝜇1 − 𝜇2 )


𝐿𝑒𝑡 𝑧 =
1 1
𝑠√𝑛 + 𝑛
1 2

(𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22


Where 𝑥̅ , 𝑦̅, 𝑠 are usually defined (𝑠 2 = 𝑛1 +𝑛2 −2

Then Z has 𝑡(𝑛1 + 𝑛2 − 2)distribution, such that

(𝑥̅ − 𝑦̅) − (𝜇1 − 𝜇2 )


𝑃 𝑡𝑛1 +𝑛2 −2,∝/2 ⪕ 𝑡𝑛1 +𝑛2 −2,∝/2 = 𝑖−∝
1 1
𝑠√𝑛 + 𝑛
{ 1 2 }

1 1 1 1
Or 𝑃 {(𝑥̅ − 𝑦̅) − 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑠√ + ⪕ (𝜇1 − 𝜇2 ) ⪕ (𝑥̅ − 𝑦̅) + 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑥𝑠√ + } = 𝑖−∝
𝑛1 𝑛2 𝑛1 𝑛2

So that a confidence interval for 𝜇1 − 𝜇2 is

1 1 1 1
{(𝑥̅ − 𝑦̅) − 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑠√ + } (𝑥̅ − 𝑦̅) + 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑥𝑠√ +
𝑛1 𝑛2 𝑛1 𝑛2

With confidence coefficient 1−∝

(7)Let X ~N (𝜇1 , 𝜎1 ) and γ ~N (𝜇2 , 𝜎2 ) where 𝜇1 , 𝜇2 are unknown and it is requested to obtain a
𝜎2
confidence interval of 𝜎12
2

𝑆 2/𝜎 2
Let Z = 𝑆12/𝜎12 (𝑆12 > 𝑆22 )
2 2

So that Z has F distribution on (𝑛1 − 𝑖, 𝑛2 − 𝑖 )𝑑, 𝑓


73

Then
𝑆12 /𝜎12
𝑃 {𝐹𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 ⪕ ⪕ 𝐹𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 } = 𝑖−∝
𝑆22 /𝜎22

𝑆12 /𝑆22 𝜎2 𝑆12/𝑆22


Or 𝑃 {𝐹 ⪕ 𝜎12 ⪕ 𝐹 } = 𝑖−∝
𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 2 𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2

𝑆12 /𝑆22 𝜎2 𝑆12/𝑆22


Or 𝑃 {𝐹 ⪕ 𝜎12 ⪕ 𝐹 } = 𝑖−∝
𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 2 𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2

1 𝑆12 𝑆12
So that { , 𝐹𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 }
𝐹𝑛1−𝑖,𝑛2−𝑖,𝑖−∝/2 𝑆22 𝑆22

𝜎2
Is a confidence interval of 𝜎12 with confidence coefficient 𝑖−∝
2

(8) Simultaneous confidence region for (𝜇, 𝜎)for a normal distribution.

Let 𝑥~𝑁(𝜇, 𝜎), 𝜇, 𝜎 with unknown

One many chose a confidence region for (𝜇, 𝜎) using the two relations

𝑠 𝑠
𝑃 {𝑥̅ − 𝑡𝑛−1,∝/2 ⪕ 𝜇 ⪕ 𝑥̅ − 𝑡𝑛−1,∝/2 } = 𝑖−∝
√𝑛 √𝑛

Diagrammatically shown as the shaded region below

𝑠
Where 𝑡𝑎 = 𝑥̅ − 𝑡𝑛−𝑖,∝/2 𝑒𝑡𝑐
√𝑛

(𝑛 − 1)𝑠 2
𝑥𝑎 = 2
𝑥𝑛−1,∝/2

But it is difficult to find the probability of the sample to full in the shaded region (confidence region)

Alternatively, using the independence of 𝑥̅ and 𝑠 2 we chose the cofidence region by the help of relation

𝑥̅ − 𝜇
𝑃 {−𝑁∝1 /2 ⪕ ⪕ 𝑁∝1/2 } = 1 −∝1
𝜎/√𝑛

2 (𝑛−1)𝑠 2 2
A,d 𝑃 {𝑥𝑛−𝑖,∝/2 ⪕ 𝜎2
⪕ 𝑥𝑛−𝑖,∝/2 } = 1 −∝2

Since 𝑥̅ , 𝑠 2 are indept

𝑥̅ − 𝜇 2 (𝑛 − 1)𝑠 2 2
𝑃 {𝑁∝1/2 ⪕ ⪕ 𝑁∝1 /2 , 𝑥𝑛−𝑖,∝/2 ⪕ ⪕ 𝑥𝑛−𝑖,∝/2 } = (1 −∝1 ), (1 −∝2 )
𝜎/√𝑛 𝜎2

Chosing ∝1 , ∝2 such that (𝐼 −∝1 ), (𝑖 −∝2 ) = 𝑖−∝ we can

Obtain the boundaris of the confidence .region without difficully this is shown by the shaded region
below

Where 𝑞 = 𝑁∝1 /2

2
𝑞1 = 𝑥𝑛−𝑖,𝑖,∝/2

Approximate confidence intervals(for large samples)

Let x be bernoulli 𝑟. 𝑣 with


74

𝑃 (𝑋 = 1) = 𝑃, 𝑃(𝑥 = 𝑜 ) = 1 − 𝑝 we want to find confidence interval for P.

For lage sample size ,n, we have

𝑝−𝑝
~𝑁(𝑜, 1)
√𝑃(𝑖 − 𝑃)/𝑛

Or
𝑝−𝑝
~𝑁(𝑜, 1)
√𝑃(𝑖 − 𝑃)/𝑛

Where þ is the sample propostion

Them , approxi mately ,

þ−þ
𝑃 {−𝑁∝/2 ≤ ≤ 𝑁∝/2 } = 1−∝
√þ(𝑖 − þ)/𝑛

þ(𝐼−þ) þ(𝐼−þ)
Or 𝑃 {þ − 𝑁∝2 √ 𝑛
⪕ þ + 𝑁∝/2 √ 𝑛
} = 1−∝

So that

þ(1 − þ) þ(1 − þ)
{þ − 𝑁∝/2 √ , þ + 𝑁∝/2 √ }
𝑛 𝑛

Is a (1−∝)% confidence interval for P

(II) For two sample we can similerly find a confidence interval for 𝑃1 , 𝑃2 as follows:

(þ1 , þ2 ) = (𝑃1 , 𝑃2 )
𝑃 {𝑁∝2 ⪕ ⪕ 𝑁∝/2 } = 1−∝
1 1
√[þ(𝐼 − þ) (𝑛 + 𝑛 )]
1 2

𝑛1 þ1+𝑛2 þ2
Where þ= 𝑛1 +𝑛2

1 1 1 1
So that {(þ1 , þ2 ) − 𝑁∝/2 √[þ(𝐼 − þ) (𝑛 + 𝑛 ) þ1 , þ2 ) − 𝑁∝/2 √[þ(𝐼 − þ) (𝑛 + 𝑛 )}
1 2 1 2

Is a (𝑖−∝)% confidence interval for 𝑝1 − 𝑝2

(iii) Let x be 𝑎, 𝑟, 𝑣 having mean 𝜇, variance 𝜎 2 and we want a confidence interval for 𝜎

For that approximately .

𝑠−𝜎
𝑃 {−𝑁∝2 ⪕ ⪕ 𝑁∝2 } = 1−∝
𝑠/√2𝑛

𝑠 𝑠
Or 𝑃 {𝑠 − 𝑁∝2 ⪕ 𝜎 ⪕ 𝑠 + 𝑁∝2 } = 1−∝
√𝑛 √𝑛

𝑠 𝑠
Then 𝑃 {𝑠 − 𝑁∝2 , 𝑠 + 𝑁∝2 }
√𝑛 √𝑛

Is a (𝑖−∝)% confidence for 𝜎

(iv) For two sample we an similerly find a cofidence interval for 𝜎1 − 𝜎2 as follows:
75

(𝑠1 − 𝑠2 ) − (𝜎1 − 𝜎2 )
𝑃 −𝑁∝/2 ⪕ ⪕ 𝑁∝/2 = 1−∝
1 1
𝑠√2𝑛 + 2𝑛
{ 1 2 }

𝑛1 𝑠12 +𝑛2 𝑠22


Where 𝑠 2 =
𝑛1 +𝑛2

So that

1 1 1 1
{(𝑠1 − 𝑠2 ) − 𝑁∝/2 𝑠√ + , (𝑠1 − 𝑠2 ) + 𝑁∝/2 𝑠√ + }
2𝑛1 2𝑛2 2𝑛1 2𝑛2

Is a (𝑖−∝)% confidence interval for (𝜎1 − 𝜎2 )

(v) Let (𝑥, 𝑦) have a bivanate normal distribution with coefficient P and me want to find a confidence
region for P.

By using Fisher,s Z transformation

1 1+𝑝
ξ= 2 𝑙𝑜𝑔𝑒 1−𝑝

1 1+𝑟
and 𝑧 = 2 𝑙𝑜𝑔𝑒 1−𝑟

whose 𝑟 is the corr crofficient in a sample of size n

𝑍−3
Then ~𝑁(𝑂, 𝐼)
1√𝑛−3

So that

𝑃 {−𝑁∝/2 ⪕ √𝑛 − 3(𝑍 − 3) ⪕ 𝑁∝/2 } = 𝐼−∝

1 1
Or 𝑃 {𝑍 − 𝑁∝/2 < 3 ⪕ 𝑍 + 𝑁∝/2 } = 𝐼−∝
√𝑛−3 √𝑛−3

So that

1 1
{𝑧 − 𝑁∝/2 , 𝑧 + 𝑁∝/2 }
√𝑛 − 3 √𝑛 − 3

Gives a (𝑖−∝)% confidence interval for ξ.From this we can earily obtain the corrponding confidence
interval for P.

NON-PARAMETRIC INFERENCE

In all problems of statictics inference considered so fan we assumed that the distribution of the
random variable breing sampled is know n except for some parameters . in pratice however the
functional from in the distribution is seldom if ever , known if is therefore desivable to devise some
produres that are free from this assumption concering distribution such produres are commonly
refered to as distribution free or non-parametric methods the term distribution free refers to the fact
that no assumptions are made about the underlying distribution execpt that the distribution function
being sampled is absolutely continuous or purely discrete. The term non-parametric refers to the
factors that there are no parameters involved in the traditional sense of the parameter used so for.
76

We will consider only the inferential problem of testing of hypothesis and dercribe a few
non-parametrictests

Single- sample problems : (a)The problem of fit : the problem of fit is to test the hypothesis that a
sample of obsevations (𝑥𝑖 , 𝑥𝑛 ) is from some specified distribution against the alternative that it is from
some other distribution.Thus we have to test

𝐻𝑜 : 𝑥~𝐹𝑜 (𝑥) = 𝐹𝑜 (𝑥)

Against 𝐻𝑜 : 𝑥~𝐹(𝑋) ≠ 𝐹𝑜 (𝑥)for some 𝑥

(i)Chi- square test: Let there be 𝓀 categories and let þ𝑖 be the probality of a random obsevation from
𝐹𝑜 (𝑥) to fall in the 𝑖𝑡ℎ category (𝑖 = 1,2, … . 𝑛).For a sample of size n, Let 𝑜𝑖 be the obsevarved freqnecy
in the 𝑖𝑡ℎ category and let 𝑒𝑖 = 𝑛þ𝑖 be the expected frequency in the 𝑖𝑡ℎ category under 𝐻𝑜 .

To test 𝐻𝑜 we use the chi-square statics

𝑛
2
(𝑜𝑖 − 𝑒𝑖)2
𝑥 =∑
𝑒𝑖
𝑖=1

The larger the value of 𝑥 2 the more likely it is that the 𝑜𝑖,𝑠 did not come from 𝐹𝑜 (𝑥). The 𝑥 2 −statistic
for large samples has a 𝑥 2 distribution on (𝓀 − 1)d.f .Thus an approximate level ∝ test is provided by
rejecting 𝐻𝑜 if

2
𝑥 2 > 𝑥𝓀−1∝,

(ii)Kolmogoror – Smironv one sample test : For the sample (𝑥𝑖 , … 𝑥𝑛 )let the empirical distribution
function 𝐹𝑛 (x) be given by

𝑜 𝑖𝑓 𝑥 < 𝑥(𝑖)
𝓀
𝐹𝑛 (𝑥) { ⁄𝑛 𝑖𝑓 𝑥(𝓀) ⪕ 𝑥 < 𝑥(𝓀−𝑖)
𝑖 𝑖𝑓 𝑥 ⩾ 𝑥(𝑛)

(𝓀 = 1,2, … 𝑛, −1) whese 𝑥(1) , 𝑥(2) , … . 𝑥(𝑛) are the order statistic , Evidently ,

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑥𝓀 , 𝑠 (𝐼, ⪕ 𝓀 ⪕ 𝑛) ⪕ 𝑥
𝐹𝑛𝑌 (𝑥) =
𝑛

For testing 𝐻𝑂 : 𝐹(𝑥) = 𝐹𝑜 (𝑥) against the two sided alternative 𝐻𝑖 : 𝐹(𝑥) ≠ 𝐹𝑜 (𝑥) we use the
Kolmogoror – Smironv statictic

𝑠𝑢𝑝 𝑌
𝐷𝑛 = [𝐹 (𝑥) − 𝐹𝑜 (𝑥)]
𝑥 𝑛

It can be shown that the K-S statistic 𝐷𝑛 is completely distribution free for any continouns distribution
𝐹𝑜 (𝑥)

At level ∝, Kolmogoror – Smironv test rejects 𝐻𝑂 if

𝐷𝑛 > 𝐷𝑛,∝

Whese 𝑃(𝐷𝑛 > 𝐷𝑛,∝ ) ⪕∝

Tables of 𝐷𝑛,∝ for given ∝ and n are available

Remark1:For testing𝐻𝑂 : 𝐹(𝑥) = 𝐹𝑜 (𝑥) against one-sided alternatives 𝐻1 : 𝐹(𝑥) > 𝐹𝑜 (𝑥) or 𝐻2 : 𝐹(𝑥) <
𝐹𝑜 (𝑥) based on one-sided K.S statistics 𝐷𝑛+ and 𝐷𝑛− are also available

Remark 2: For small sample 𝑥 2 −test is not available but K.S test can be applied. For discrete distibution
K.Stest is not availible but 𝑥 2 −test can be appled K.S test is more powerful then 𝑥 2 −test.
77

(B) The problem of Location: Let (𝑥𝑖 , … . 𝑥𝑛 ) be a radom sample from a distribution 𝐹(𝑥) with unknown
median ξ ,where 𝐹(𝑥) is assumed to be continus in the neigbourhood of ξ. By definition of median
1
(𝑃 (𝑥 ⩾ 𝜉 ) = 2 .We would like to test the hypothesis

(x) If n>25, normal appronimution may be used

𝑅−𝑛/2
We take 𝑛/4
~𝑁(𝑜, 𝑖)

𝐻𝑜 : 𝜉 = 𝜉𝑜 against one sided or two sided alternatioes

Sign Test: We from the n differences (𝑥𝑖 − 𝜉𝑂 )𝐼 = 1,2 … … … 𝑛 and find out the number, R,of position
differences (differences having postive signs ) 𝑖, 𝑒 when (𝑥𝑖 − 𝜉𝑂 ) > 𝑜.

1 1
If 𝐻𝑂 is true, 𝑃(𝑋𝑖 − 𝜉𝑂 ⩾ 𝑂) = 2 , 𝑖 = 1,2, … . 𝑛 and R has a Biomial distribution with paramer2 . We
may use an exect test of𝐻𝑂 based on the Biomial Distribution. In the case of one-sided alternative

𝐻𝑖 : 𝜉 > 𝜉𝑜

The sample will have an excess of positive signs and in the case of

𝐻𝑖 : 𝜉 > 𝜉𝑜

The sample will have a small number of postive signs

The signs test based on R, for testing 𝐻𝑂 can be summarised as follows :

The critical values 𝑅1∝ , 𝑅2∝ , 𝑅∝/2 , 𝑅∝/2 are calculate from tables of Biomaial distribution

Rajred –sample signs test: Here we assume that we have a random sample of n pains (𝑥𝑛 , 𝑥𝑛 ) giving the
the differences

𝐷𝑖 = 𝑥𝑖 − 𝑦𝑖 , 𝑖 = 1, … 𝑛

It is assumed that the distribution of D=X-Y is absolutely continous with median ξ

We have , now a single sample 𝐷𝐼 , … . . 𝐷𝑛 and we can test 𝐻𝑜 : ξ = 𝜉𝑜 which can be taken to be oby the
sign test descrited above.

Remark the above two sign test s are , repectively aralogoun to single sample 𝑡 − 𝑡𝑒𝑠𝑡 and paired t-
test for testing location of a normal distribution ,

Two sample problems : let (𝑥𝑖 , … … 𝑥𝑛 ) and (𝛾𝑖 , … … 𝛾𝑛 ) be independent random sample s from two
absolutely continous distribution 𝐹𝑥 (𝑥) and 𝐹𝛾 (𝓎) , respectively

Suppose we want to test

𝐻𝑜 : 𝐹𝑥 (𝑥) = 𝐹𝛾 (𝓎) for all 𝑥

Against 𝐻𝑖 : 𝐹𝑥 (𝑥) ≠ 𝐹𝛾 (𝓎) for same 𝑥

Run test(Wald –Wolfowitz): we assarge the m, x’s and n 𝛾′𝑠 in increasing order of size
𝑋𝑌𝑌𝑋𝑋𝑌𝑌𝑌𝑋𝑌and count the numbers of runs .if 𝐻𝑜 is true the (m+n) values will be well mixed up and
we expect that R, the total number of runs , will be relatively large. But R will be small if the samples
come from differernt popaltions 𝑖, 𝑒 𝐻𝑜 is false in the extreme case , if all the value of y are greater than
all the value of x, or vice – vera , there will be only two runs

The run test of 𝐻𝑜 against 𝐻𝑖 at level ∝ is to reject 𝐻𝑜 if

𝑅 ⪕ 𝑅∝
78

Where 𝑅∝ is the largest interteger such that

𝑃(𝑅 ⪕ 𝑅∝ /𝐻𝑜 ) ⪕∝

It can be show that distribution of R, under 𝐻𝑜 is given by

𝑚−𝑖 𝑛−𝑖 𝑚+𝑛


𝑃(𝑅 = 2 ∝/𝐻𝑜 ) = 2 ( )( )/( )
∝ −𝑖 ∝ −𝑖 𝑚

𝑚−𝑖 𝑛−𝑖 𝑚−𝑖 𝑛−𝑖


And 𝑃(𝑅 = 2 ∝ +𝑖/𝐻𝑜 ) = ( )( )+( )( )
∝ ∝ −𝑖 ∝ −𝑖 ∝

Tables of critical values of R based on above have been given by swed and Eisenhant

For large m,n(both greater then 10), Ris asymptohcally Normally distributed with

2𝑚𝑛
𝐸 (𝑅) = +1
𝑚+𝑛
2𝑚𝑛(2𝑚𝑛−𝑚−𝑛)
And 𝑉(𝑅) =
(𝑚+𝑛)2(𝑚+𝑛−𝑖)

Median it test: We arrange the x’s and y’s in asscending order of size and find the median M of the
contied sample let

𝑉 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑥 ′ 𝑠𝑤ℎ𝑖𝑐ℎ 𝑎𝑟𝑒 ⪕ 𝑚𝑒𝑑𝑖𝑎𝑛 𝑀

If V is large it is reasomable to conclude that the actual median of x is smaller than the median of Y
𝑖, 𝑒 𝐻𝑜 : 𝐹𝑥 (𝑥) = 𝐹𝑌 (𝑥) is respected

Hown of 𝐻𝑖 : 𝐹𝑥 (𝑥) > 𝐹𝑌 (𝑥) −

On the other hand , if V is too small it is reamable to condude that the actual median of X is greater
than the median of y 𝑖. 𝑒 𝐻𝑜 : 𝐹𝑥(𝑥) = 𝐹𝑦 (𝑥)is respected in fovoues of 𝐻𝑖 : 𝐹𝑥 (𝑥) < 𝐹𝑌 (𝑥)

For the two sided alternative , we use the two sided test .

The median test can be summarised as follows:

It can be shown that the distribution of V, under 𝐻𝑜 is given by

𝑚 𝑛
( )(þ−𝑢)
𝑢
𝑃(𝑉 = 𝑢/𝐻𝑜 ) = 𝑚+𝑛 ,𝑢 = 𝑜, 1 … … . . , 𝑛
( )
þ

Where 𝑚 + 𝑛 = 2þ, þ positive integer

And

𝑚 𝑛
( )(þ−𝑢)
𝑢
𝑃(𝑉 = 𝑢/𝐻𝑜 ) = 𝑚+𝑛 , 𝑢, 1 … … . min(𝑚, þ)
( )
þ

Where 𝑚 + 𝑛 = 2þ + 1, þ𝑖𝑠 𝑎 positive integer

Wilcoxon- Mann –Whitney U test: This is the most widely used two- sample non-parametric test and is
a useful alternative to the t-test assumotions.

The test is like the run test based on the pattern of 𝑚, 𝑥 ′ 𝑠 and 𝑛, 𝑦′𝑠 arranged in ascending order of
size . The Main- Whitney U statistic is defined as the number of times as X preades 𝑎 𝑌 In the combined
sample of size 𝑚 + 𝑛. We define

1, 𝑥𝑖 < 𝑦𝑗 𝑖 = 1, … … 𝑚
𝑧𝑖𝑗 = ( ( )
0 , 𝑥𝑖 > 𝑦𝑗 𝑗 = 1 … … . 𝑛
79

And write

𝑚 𝑛

𝑈 = ∑ ∑ 𝑧𝑖𝑗
𝑖=1 𝑖=1

Note that ∑𝑚 𝑖=1 𝑧𝑖𝑗 is the number of 𝑦𝑗′𝑠 that are larger than 𝑥𝑖 and hence U is the number of values of
𝑥𝑖 , … … … 𝑥𝑛 that are smaller than each of 𝑦𝑖 , … … … , 𝑦𝑛 . For example , suppose the contined sample
when ordered is as follows :

𝑋2 < 𝑋1 < 𝑌3 < 𝑌2 < 𝑋4 < 𝑌1 < 𝑋3

Then U=7, becouse there are three values of X<𝑌1 , two values of X<𝑌2 and two values of X<𝑌3

It is obseved that U=0 if all the𝑥𝑖 ′𝑠 are larger than all 𝑦𝑖 ′𝑠 and U=mn of all the 𝑥𝑖 ′𝑠 are smaller than all
the 𝑦𝑖 ′𝑠. Thus 𝑜 ⪕ 𝑈 ⪕ 𝑚𝑛. If U is large the values of y tend to be larger than X (Y is stochastically larger
than X) and this supposts the alternative 𝐹𝑥 (𝑥) > 𝐹𝛾 (𝑥). Similarly, if U is small, the values of Y tend to
be smaller than X and this supposts the alternative 𝐹𝑥 (𝑥) > 𝐹𝛾 (𝑥).

Thereforer , U-test can be summarised as follows:

𝐻𝑜 𝐻𝑖 𝑅𝑒𝑗𝑒𝑐𝑡𝐻𝑜 𝑖𝑓

𝐹𝑥 (𝑥) = 𝐹𝛾 (𝑥) 𝐹𝑥 (𝑥) > 𝐹𝛾 (𝑥). 𝑈 ⩾ 𝐶1

𝐹𝑥 (𝑥) = 𝐹𝛾 (𝑥) 𝐹𝑥 (𝑥) < 𝐹𝛾 (𝑥) 𝑈 ⪕ 𝐶2

𝐹𝑥 (𝑥) = 𝐹𝛾 (𝑥) 𝐹𝑥 (𝑥) ≠ 𝐹𝛾 (𝑥) 𝑈 ⩾ 𝐶3 𝑜𝑟 𝑈 ⪕ 𝐶4

It can be shown that Under 𝐻𝑂

𝑚𝑛
𝐸(𝑈) =
2
𝑚𝑛(𝑚+𝑛+1)
And 𝑉(𝑈) =
12

The tables of distribution of U for small samples are given by table and Mann-Whitney. For large
samples U has asymptotic normal distribution,𝑖, 𝑒

𝑚𝑛
𝑈− 2
~𝑁(𝑂, 𝐼)
√ 𝑚𝑛(𝑚 + 𝑛 + 1)
12

APPENDIX

Distribution of function of random variables (transformations method)

Therom: suppose Xis a continuous 𝑟, 𝑢 with þ, 𝑑, 𝑓 𝑓𝑥 (𝑥). Set 𝑥 = {𝑥,𝑓𝑥 (𝑥) > 𝑜}.Let

(i) 𝓎 = ℊ(𝑥) difine a d.f transformation of 𝑥 anto 𝑥

(ii) the derivative of 𝑥 = ℊ−1 (𝑥) 𝜔. 𝑟. 𝑡 𝓎 is continous and non-zero for 𝓎 𝜖 𝑥, where ℊ−1 (𝓎 ) is the
inverse for of 𝓎 (𝑥) 𝑖, 𝑒 ℊ−1 (𝓎 ) isthat 𝑥 for which ℊ(𝑥) = 𝓎

Then 𝛾 = ℊ(𝑥) is a cont. 𝑟, 𝑢 with þ, 𝑑, 𝑓.

𝑑
𝑓𝑦 (𝓎) = 𝑓𝑥 (ℊ−1 (𝓎 )) [ ℊ−1 (𝓎 )]
𝑑𝓎
80

Therom : let 𝑥1 and 𝑥2 be jointly continous 𝑟. 𝑢. 𝑠 with þ, 𝑑, 𝑓 𝑓𝑥1 ,𝑥2 (𝑥1 , 𝑥2 ). Set 𝑥 =
{(𝑥1 , 𝑥2 ): 𝑓((𝑥1 , 𝑥2 ) > 𝑜}Assumu that

(i)𝓎1 , = ℊ1 (𝑥1 , 𝑥2 ) and 𝓎2 , = ℊ2 (𝑥1 , 𝑥2 ) defines i:i transformation of x onto x.

(ii)The first partical of derivatives of 𝑥1 = ℊ𝑖−1 (𝓎1 𝓎2 ) and 𝑥2 = ℊ𝑖−1 (𝓎1 𝓎1 ) are continous over x.

(iii) The jacebian of transformation is non-zero for (𝓎1 𝓎1 )𝜖𝑥.Then the joint þ, 𝑑, 𝑓 of 𝛾1 =
ℊ, (𝑥1 , 𝑥2 )and 𝛾2 = ℊ, (𝑥1 , 𝑥2 )is given by

𝑓𝛾1𝛾2 (𝓎1 , 𝓎2 ) = 𝑓𝑥1𝑥2 {ℊ1−1 (𝓎1 , 𝓎2 )ℊ2−1 (𝓎1 , 𝓎2 }𝑖𝑗𝑖

Where

∝ 𝑥𝑖 ∝ 𝑥1
∝ 𝓎𝑖 ∝ 𝓎2
𝐼𝐽𝐼 = [ ∝ 𝑥 ∝ 𝑥 ]
2 2
∝ 𝓎1 ∝ 𝓎2

X2- distribution

Definition : A continous 𝑟, 𝑢, 𝑥 is said to have the X2- distribution on n degrees of freedom if its þ, 𝑑, 𝑓 is
given by

1 𝑛
𝑓 (𝑥 ) = 𝑥 2 −1 𝑒 −𝑥/2 , 𝑥 ⩾ 𝑜
𝑥 𝑛/2 1(𝑛⁄2)

=𝑜 𝑥<𝑜

The 𝑚, ℊ, 𝑓 of x is given by

𝑀𝑥 (𝑡) = 𝐸𝑒 𝑡𝑥

1 𝑛
= ∫ 𝑥 2 −1 𝑒 𝑥(1−2𝑡)/2𝑑𝑥
𝑥 𝑛/2 1(𝑛⁄2) 𝑜

1 1(𝑛⁄2)
=
𝑥 𝑛/2 1(𝑛⁄2) (1 − 2𝑡 ) 𝑛/2
2

= (1 − 2𝑏)−𝑛/2

From this we can earily show that

𝐸(𝑋) = 𝑛 𝑎𝑛𝑑 𝑣(𝑥) = 2𝑛

For 𝑛 ⪕ 2the þ, 𝑑, 𝑓 of 𝑥 2 (𝑛) steadily dencress as 𝑥 iscrese while for 𝑛 > 2 there is a uniqne maximum
at 𝑥 = 𝑛 − 2

Theorom : Let 𝑥1 , 𝑥2 … … . . 𝑥𝑛 be n independent standand normal r,v,s 𝑖. 𝑒 𝑥𝑖 ~𝑁(𝑜, 1), 𝑖 = 1, … 𝑛 Then


𝑛
𝑦 = 2 𝑥𝑖2 has a X2- distribution on 𝑛, 𝑑, 𝑓.

Proof: Let X be 𝑁(𝑜, 1) the 𝑚, ℊ, 𝑓 of 𝑥 2 is given by

2
𝑀𝑥2 = 𝐸(𝑒 𝑡𝑥 )


1 2
= ∫ 𝑒 𝑡𝑥 − 𝑥 2/2 𝑑𝑥
√2𝜋 −∞


1 2
= ∫ 𝑒 𝑡𝑥 − 𝑥 2(1−2𝑡)/2 𝑑𝑥
√2𝜋 −∞
81

√2𝜋 1
=
√1 − 2𝑡) √2𝜋

= (1 − 2𝑡)−1/2

Which show that 𝑥 2 ~𝑥 2 (1) Then , the 𝑚, ℊ, f of 𝛾 = ∑𝑛𝑖 𝑥𝑖2 is given by

𝑀𝑋2 (𝑡) = [𝑀𝑋2 (𝑡)]𝑛 = (1 − 2𝑡)−𝑛/2

Which shows that 𝛾~𝑥 2 (𝑛)

Therom : Let 𝛾1 , 𝛾2 … . 𝛾𝑛 be indepent 𝑟, 𝑢, 𝑠 with X2- distribution on 𝑛𝑖 , … … . 𝑛𝓀 degrees of freedom


resp .

Then 𝑧 = ∑𝓀 2
𝑖 𝛾𝑖 ~𝑥 (𝑛1 + 𝑛2 +. . +𝑛𝓀 )

Proof :the 𝑚, ℊ, 𝑓 Z

𝑀𝑍 (1) = 𝐸𝑒 𝑡𝑧

𝓀
𝑡
= 𝐸𝑒 ∑ 𝑌𝑒
𝑖

= ∏ 𝐸(𝑒 𝑡𝑦𝑒 )
𝑖=1

= (1 − 2𝑡)−(𝑛𝑖 +..+𝑛𝓀 ) /2

Which about that y~𝑥 2 (𝑛𝑖 + ⋯ + 𝑛𝓀 )

(𝑥𝑖−𝜇)2
Crollanj : Let (𝑥𝑖 , … . . 𝑥𝑛 )be a random simple from a Normal distributuion 𝑁(𝜇, 𝜎).Then ∑𝑛𝑖=1 𝜎2
has
2
𝑥 distribution on 𝑛, 𝑑, 𝑓.

Therom: Let (𝑥𝑖 , … . . 𝑥𝑛 )be a random simple from a Normal distributuion 𝑁(𝜇, 𝜎) Let 𝑥̅ = ∑𝑛𝑖 𝑥𝑖 /𝑛

1 (𝑛−𝑖)𝑠 2
And 𝑠 2 = ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 be the sample mean and sample variance. Then has 𝑥 2 distribution on
𝑛−𝑖 𝜎2
(𝑛 − 𝑖 )𝑑, 𝑓.

Therom: For large 𝑛, √2𝑥 2 can be shown to be approximately normally distributred with mean √2𝑛 − 1
and st-dearation unity.

Therom: Assume that y has distribution function 𝐹𝑌 which satifies some regularity conditions ad which
has r-unknown parameters 𝜃1 , 𝜃2 … . 𝜃𝑟 and that (𝑦𝑖 , . . 𝑦𝑛 ) is a random sample of y.Let 𝜃̂𝑖 , 𝜃
̂𝑟 be the
𝑚. ℓ, 𝑒 of 𝜃′𝑠 .Suppose the sample is distribution in 𝓀 non-orerlapping intervals {𝐼𝐽 }
where 𝐼𝐽 = {𝓎: 𝑎𝑗−𝑖 < 𝑦 < 𝑎𝑗−𝑖 }, 𝑗 = 1, … 𝓀(𝑎𝑜 = −∞𝑎𝓀 = ∞and . Let 𝑥𝑖 , … . . 𝑥𝓀 be the number of
sample values falling in these inervals, respectively if me define

𝑝̂𝑗 = 𝑃{𝑌𝑓𝑎𝑙𝑙𝑠 𝑖𝑛𝐼𝐽 }, 𝑗 = 1, … 𝓀

2
𝓀 (𝑥𝑗−𝑛𝑝
̂)
𝑗
Where 𝜃̂𝑖 , 𝜃̂
𝓀 replace 𝜃𝑖 , 𝜃𝓀 in 𝐹𝑦 ,then the distribution of the statistics 𝑧 = ∑𝑗=1 ̂
𝑛𝑃
Lerger is
𝑗
2
appoximately distributed as 𝑥 on 𝓀 − 𝑟 − 𝑖 𝑑, 𝑓 as n gets

Students t-distribution

Definintion : A Continous 𝑟, 𝑢, 𝑥 is said to have the t-distribution on 𝑛, 𝑑, 𝑓 if its þ, 𝑑, 𝑓 is given by


82

𝑛+1
[( 2 ) 1
𝑓 (𝑥) 𝑛 2 𝑛+1 , −∞ < 𝑥 < ∞
[( 2) √𝑛𝜋 (1 + 𝑥 ) 2
𝑛

Remark : For 𝑛 = 𝑖the þ, 𝑑, 𝑓

1 1
𝑓(𝑥) = , −∞ < 𝑥 < ∞
𝜋 𝑖 + 𝑥2

Which shows that it is a couchy distribution We will therefore, assume that 𝑛 > 𝑖

Remark:the þ, 𝑑, 𝑓 of t-distribution is symmctric about again. For large n the t-distribution tends to
Normal distribution. For small n hawever t-distribution deviates considerally from the normal in fact if
𝑇~𝑡(𝑛) and 𝑧~𝑁(𝑜, 𝑖)

𝑃{[𝑇] ⩾ 𝑡𝑜 } ⩾ 𝑃{[𝑍] > 𝑡𝑜 }

Moments : Since the distribution a symmetrial about origin 𝜇2𝑟 + 1 = 0

For 2r<n

𝜇2𝑟 = 𝐸 (𝑋 2𝑟 )

𝑛+1
2[( 2 ) ∞ 𝑋 2𝑟
= 𝑛 ∫ 𝑑𝑥
[( 2) √𝑛𝜋 𝑜 (1 + 𝑥 2𝑟 ) 𝑛 + 1
𝑛 2

𝑥
Therom : Let 𝑥~𝑁(𝑜, 1) and 𝑦~𝑥 2 (𝑛) and Let 𝑥and 𝑦 be independent .Then 𝑈 =
√𝑦/𝑛

You might also like