Statistical Inference
Statistical Inference
STATISTICAL INFERENCE
BY DR RAJIV SAKSENA
DEPARTMENT OF STATISTICS
UNIVERSITY OF LUCKNOW
We know that statistical data is nothing but a random sample of observations drawn from a
population described by a random variable whose probability distribution is unknown or partly
unknown and we try to know about the properties of the population on the basis of knowledge of
the properties of the sample. This inductive process of going from known sample to the unknown
population is called ‘Statistical Inference ‘
Formally, let x be a random variable describing the population under investigation. Suppose X has
þ.m.ƒ 𝑓𝑜 (𝑥) = 𝑃(𝑥 = 𝑥) or þ d ƒ 𝑓𝑜 (𝑥) which depend on some unknown parameter 𝜃 (single or vector
valued) that may have any value in a set Ω (called the parameters space). We assume that the
functional form of 𝑓𝑜 (𝑥) is known but not the parameter 𝜃(except that 𝜃 ∈ Ω). For example, the
family of distributions {𝑓𝜃 (𝑥), 𝜃 ∈ Ω} may be the family of Poisson distribution {𝑃(𝜆), 𝜆 ≥ 0} or
normal distribution {𝑁(𝜇, 𝜎 2 ), −∞ <𝜇 < ∞, 𝜎 ≥ 0}
POINT ESTIMATION
Definition: A random sample of size ‘n’ from the distribution of X is a set of independent and
identically distributed random variables {𝑥1 , 𝑥2 , … , 𝑥𝑛 } each of which has the same distribution as
that of X. The probability of the sample is given by
Definition: A statistic T = T (x1,x2,…, xn) is any function of the sample values, which does not
depend on the unknown parameter 𝜃. Evidently, T is a random variable which has its own
probability distribution (called the ‘ Sampling distribution’ of T)
1 1
For example, 𝑥̅ =𝑛 ∑𝑛𝑖 𝑥𝑖 ; 𝑠 2 = 𝑛−1 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 𝑋(1) = min(𝑥1 , 𝑥2, … 𝑥𝑛 ) , 𝑋(𝑛) = 𝑚𝑎𝑥(𝑥1 , 𝑥2, … 𝑥𝑛 ) are
some statistics.
If we use the statistic T to estimate the unknown parameter 𝜃, it is called the estimator (or point
estimators) of 𝜃 and the value of T obtained from a given sample is its ‘estimate’
Theorem : Let (X1, X2,…,Xn) be a random sample of ‘n’ observations on X with mean 𝐸(𝑋 ) = 𝜇 and
1 1
variance 𝑉𝑎𝑟(𝑥) = 𝜎 2 Let the sample mean and sample variance be 𝑥̅ = ∑𝑛𝑖 𝑥𝑖 and 𝑠 2 = ∑(𝑥𝑖 − 𝑥)2
𝑛 𝑛
Then,
̅̅̅=𝜇
(I)E(𝑋)
2
̅̅̅=𝜎
(ii)𝑉(𝑋) 𝑛
𝑛−1
(iii)E(𝑆 2 ) = 𝑛
𝜎2
Prof: We have
𝑛 𝑛
1 1
̅̅̅ = 𝐸 ( ∑ 𝑥𝑖 ) = ∑ 𝐸 (𝑥𝑖 ) = 𝜇
𝐸(𝑋)
𝑛 𝑛
𝑖 𝑖
𝑛 n
1 1 𝜎2
̅̅̅ = 𝑉 ( ∑ 𝑥𝑖 ) = ∑ V (𝑥𝑖 ) =
𝑉(𝑋)
𝑛 n2 𝑛
𝑖 i
𝑛
2)
𝐸(𝑛𝑠 = 𝐸 ∑( 𝑥𝑖 − 𝑥̅ )2
𝑖
= 𝑛𝜎 2 − 𝑛𝜎 2 /𝑛
= (𝑛 − 1)𝜎 2
𝑛−1 2
𝐸(𝑠 2 ) = 𝜎
𝑛
PROPERTIES OF ESTIMATORS
UNBIASEDNESS:
𝐸𝑥𝑎𝑚𝑝𝑙𝑒. If (𝑥1 , 𝑥2, … , 𝑥𝑛 ) is a random sample from any population with mean 𝜇 and variance 𝜎 2,
the sample mean 𝑥̅ is an unbiased estimator of 𝜇 but the sample variance 𝑆 2 is not an unbiased
estimator of 𝜎 2 .
𝑛𝑠 2 1
However, 𝑛−1
= 𝑛−1 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 is an unbiased estimator of 𝜎 2 .
𝐸𝑥. if (𝑥1 , 𝑥2, … 𝑥𝑛 ) is a random sample from 𝑎 normal distribution 𝑁(𝜇, 𝐼) show that 𝑇=
1
∑𝑛𝑖 𝑥𝑖 2 − 1 is an unbiased estimator of 𝜇2 ,
𝑛
1 1
Soln. 𝐸(𝑇) = 𝐸 [ ∑𝑛𝑖 𝑥𝑖 2 − 1] = ∑𝑛𝑖 𝐸(𝑥𝑖 2 ) − 1
𝑛 𝑛
𝑛
1
= ∑(𝜇2 + 1) − 1 = 𝜇2
𝑛
1
𝑬𝒙𝒂𝒎𝒑𝒍𝒆: Let (𝑥1 , 𝑥2, … 𝑥𝑛 ) be a random sample of observation from a Bernoulli distribution
𝑦(𝑦−1)
ƒ𝜃 (𝑥) = 𝜃 𝑥 (1 − 𝜃)1−𝑥 (𝑥 = 0,1) show that 𝑇 = is an unbiased estimator of 𝜃 where 𝑦 = ∑𝑛𝑖 𝑥𝑖
𝑛(𝑛−1)
Soln: We know that 𝐸(𝑥𝑖 ) = 𝜃 and 𝑉(𝑥𝑖 ) = 𝜃(1 − 𝜃) so that 𝐸(𝑌) = 𝑛 𝜃 and 𝑉(𝑌) = 𝑛𝜃(1 − 𝜃)
Now
= 𝑛𝜃 (1 − 𝜃) + 𝑛2 𝜃 2 − 𝑛𝜃
= 𝑛(𝑛 − 1)𝜃 2
𝑌(𝑌−1)
E(T)=𝐸 [𝑛(𝑛−1)] = 𝜃 2
Example: Show that the mean 𝑥̅ of a random sample of size 𝑛 from the exponential distribution
1 𝑥
ƒ𝜃 (𝑥) = 𝜃 𝑒̅ 𝜃 (𝑥 > 0) is an unbiased estimator of 𝜃 and has variance 𝜃 2 /𝑛
Example: Let (𝑥1 , 𝑥2, … 𝑥𝑛 ) to a random sample from a normal distribution with mean 0 and
variance 𝜃 (0< 𝜃 < ∞) so that 𝑇 = ∑ 𝑥𝑖2 /𝑛 is an unbiased estimator of 𝜃 and has variance 2𝜃 2 /n
4
𝑛
1
𝐸(𝑇) = ∑ 𝐸(𝑥𝑖2 ) = 𝜃
𝑛
𝑖
Also 𝐸(𝑥𝑖4 ) = 𝜇4 = 3𝜃 2
𝑛
1
𝑉(𝑇) = 𝑉 ( ∑ 𝑥𝑖2 )
𝑛
𝑖
𝑛
1
= 2 ∑ 𝑉(𝑥𝑖2 )
𝑛
𝑖
𝑛
1
= 2 ∑[𝐸(𝑥𝑖4 ) − {𝐸(𝑥𝑖2 )}2 ]
𝑛
𝑖
𝑛
1
= 2 ∑[3𝜃 2 − 𝜃 2 ]
𝑛
𝑖
2𝜃 2
=
𝑛
Example Let (𝑥1 , 𝑥2, … 𝑥𝑛 ) be a random sample from the rectangular distribution 𝑅(0, 𝜃) having
1
þ, 𝑑, 𝑓 𝑓𝜃 (𝑥) = { 𝜃 , 𝑜 ≤ 𝑥 ≤ 𝜃 (𝜃 > 𝑜)
0 ,otherwise
𝑛+1
Show that 𝑇1 = 2𝑥,
̅ 𝑇2 = 𝑛
𝑌𝑛 and 𝑇3 = (𝑛 + 1)𝛾𝑖 are all unbiased for 𝜃 , where 𝑌1 =
∑𝑛𝑖 𝑥𝑖 𝜃2
𝐸 (𝑇𝐼 ) = 𝐸 2 ( ) = 𝜃 𝑎𝑛𝑑 𝑉(𝑇𝐼 ) =
𝑛 3𝑛
The 𝑑. 𝑓. of Yn is-
𝐹𝓎 (𝓎 ) = 𝑃(𝑌𝑛 ≤ 𝓎)
= 𝑃(max(𝑥1 , 𝑥2, … 𝑥𝑛 ) ≤ 𝓎)
= 𝑃(𝑥𝑖 ⪕ 𝓎, 𝑥𝑛 ⪕ 𝓎)
5
= [𝑃(𝑥 ⪕ 𝓎)]𝑛
𝓎 𝓎𝑛
=( )= 𝑛
𝜃 𝜃
𝑛𝓎𝑛−𝑖 ,0⪕𝓎⪕𝜃
þ, 𝑑, 𝑓 𝑜𝑓 𝑌𝑛 is- ℊ𝑌𝑛 (𝓎 ) = { 𝜃𝑛
0 ,𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝜃 𝑛𝓎𝑛 𝑛
Hence, 𝐸 (𝑌𝑛 ) = ∫0 𝓎=( )𝜃
𝜃𝑛 𝑛+1
𝑛+1
Or 𝐸( 𝑌𝑛 ) = 𝜃
𝑛
𝜃2
[We can check that V (T2) =𝑛(𝑛+2)]
𝐹𝑌𝑖 (𝓎 ) = 𝑃{𝑌𝑖 ≤ 𝓎}
= P{min(𝑥1 , 𝑥2, … 𝑥𝑛 ) ≤ 𝓎}
𝓎
= 𝐼 − [𝐼 − ]
𝜃
þ, 𝑑, 𝑓 of 𝑌𝑖 is
𝑛(𝜃 − 𝓎)𝑛−1 , 0 ⪕ 𝓎 ⪕ 𝜃
ℊ𝑌𝑖 (𝓎 ) = {
𝜃𝑛
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝜃 𝑛𝓎(𝜃−𝓎)𝑛−1
Hence, 𝐸(𝑌𝑖 ) = ∫𝑂 𝜃𝑛
𝑑𝓎
𝑛 (𝜃 − 𝓎)𝑛 𝜃 1 𝜃
= 𝑛 {−𝓎 ⌊ + ∫ ( 𝜃 − 𝓎)𝑛 𝑑𝓎
𝜃 𝑛 𝑜 𝑛 0
𝑛 −1 (𝜃 − 𝓎)𝑛+1 𝜃
= [[ ]
𝜃𝑛 𝑛 𝑛+1 𝑜
𝜃
=
𝑛+1
𝑛
𝑤𝑒 𝑐𝑎𝑛 𝑐ℎ𝑒𝑐𝑘 𝑡ℎ𝑎𝑡 𝑉(𝑇3 ) = 𝜃2
[ 𝑛+2 ]
𝑠𝑜 𝑡ℎ𝑎𝑡 𝑉(𝑇2 ) < 𝑉(𝑇1 ) < 𝑉(𝑇3 )
Example: Let ((𝑥1 , 𝑥2, … 𝑥𝑛 ) be a random variable from the Rectangular distribution 𝑅(𝜃, 2𝜃) having
þ, 𝑑, 𝑓
1
𝑓(𝑥, 𝜃) = { , 𝜃 ≤ 𝑥 ≤ 2𝜃
𝜃
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝑛+1 𝑛+1
Show that 𝑇𝐼 = 2𝑛+1 𝑥(𝑛) , 𝑇2 = 𝑛+2 𝑥(1)
𝑛+1 2
And 𝑇3 = 5𝑛+4 [2𝑥(𝑛) + 𝑥(1) ]𝑎𝑛𝑑 𝑇4 = 3 𝑥̅ are all unbiased
Soln: We can show that the distribution ƪ𝑥(𝑛) 𝑑𝑥(𝑖) have þ, 𝑑, 𝑓 given by
𝑛(𝓎 − 𝜃)𝑛−1
𝑓𝑥(𝑛) (𝓎 ) = = 𝜃 ≤ 𝓎 ≤ 2𝜃
𝜃𝑛
𝑛(2𝜃 − 𝓎)𝑛−1
𝑓𝑥(1) (𝓎 ) = 𝜃 ≤ 𝓎 ≤ 2𝜃
𝜃𝑛
Example: Let 𝑦1 , 𝑦2 , 𝑦3 be the order statistics of a random sample of size 3 from 𝑎 uniform
1 4
distribution having þ, 𝑑, 𝑓 𝑓 (𝑥, 𝜃) = (0 ≤ 𝑥 ≤ 𝜃) show that 4𝑦1 , 2𝑦2 , 𝑦3 are all unbiased estimator
𝜃 3
3(𝜃 − 𝓎 )
𝑓𝑦1 (𝓎 ) = =𝑜≤𝓎≤𝜃
𝜃3
6𝓎 (𝜃 − 𝓎 )
𝑓𝑦2 (𝓎) = =𝑜≤𝓎≤𝜃
𝜃3
3𝓎2
𝑓𝑦3 (𝓎 ) = 3 = 𝑜 ≤ 𝓎 ≤ 𝜃
𝜃
*If 𝑦1 , 𝑦2 , … , 𝑦𝑛 are two unbiased estimator with variance 𝜎12 , 𝜎22 and correlation coeff. P between
than the linear combination which is unbiased and has minimum variance is.
*If 𝑦1 , 𝑦2 , … , 𝑦𝑛 are ind ept unbiased estimators if 𝜃 with variance 𝜎𝑖2 (𝑖 = 1,2. . 𝑛), the linear
combination with minimum variance is
𝑌 = 𝓀1 𝑦1 + 𝓀2 𝓀2 + +𝓀𝑛 𝓀𝑛
Where
7
1
𝓀𝑖 = / ∑𝑛𝑖(𝑖 /𝜎𝑖2 )
𝜎𝑖2
𝐼 𝐼 𝐼
𝑦1 + 2𝑦2 +⋯+ 2 𝑦𝑛
𝜎21 𝜎2 𝜎𝑛
𝑖. 𝑒 𝑦= 𝐼 𝐼 𝐼
+ 2+⋯+ 2
𝜎21 𝜎2 𝜎𝑛
Example Let ‘T’ be an unbiased estimator of 𝜃. Does it imply that 𝑇 2 and√𝑇 , are unbiased for
𝜃 2 𝑎𝑛𝑑 √𝜃) respectively?
Example let 𝑦1 , 𝑦2 , be independent unbiased estimator of 𝜃, having finite variance (𝜎12 , 𝜎22 , 𝑠𝑎𝑦).
Obtain a linear combination of 𝑦1 , 𝑦2 which is unbiased and has the smallest variance.
Evidently, 𝓀 + 𝓀′ = 1 or 𝓀′ = 1 − 𝓀
1 1
𝑌 + 𝑌
𝜎22 𝜎12 𝜎12 1 𝜎22 2
𝑌=( 2 ) 𝑦1 + 2 𝑦2 =
𝜎1 + 𝜎22 (𝜎1 + 𝜎22 ) 1 1
+
𝜎12 𝜎22
Remarks: (𝑖) An unbiased estimator may not exist. Let x be a random variable with Bernoulli
distribution.
Let X be a random variable having Poisson distribution 𝑃(𝑥) and suppose we want estimator 𝓰(𝜆)
=ℯ 3𝜆 . Consider a sample of one observation and the estimator T= . Then E(T)= ℯ −3𝜆 so that T is an
unbiased estimator of ℯ −3𝜆 but T(x)= (-2) X for x even and T(x) < 0 for 𝑥 odd, which is absurd since
ℯ −3𝜆 is always positive.
(𝑖𝑖𝑖 ) Instead of the parameter 𝜃 we may be interested in estimating a function 𝓰(𝜃). 𝓰(𝜃) is said to
be ‘estimable’ if there exists an estimator T Such that E(T)= 𝓰(𝜃), 𝜃 ∈ 𝛺.
Minimum Variance Unbiased (MVU) estimators : The class of unbiased estimators may, in
general, be quite large and we would like to choose the best estimator from this class. Among
two estimators of 𝜃 which are both unbiased , we would choose the one with smaller variance.
The reason for doing this rests on the interpretation of variance as a measure of concentration
about the mean. Thus, if T is unbiased for 𝜃, then by Chebyshev’s inequality-
𝑉𝑎𝑟(𝑇)
𝑃{[𝑇 − 𝜃] ≤ 𝜀} > 1 −
𝜀2
Therefore, the smaller 𝑉𝑎𝑟(𝑇) is, the larger the lower bound of the probability of concentration of T
about 𝜃 becomes. Consequently, within the restricted class of unbiased estimators we would choose
the estimator with the smallest variance.
(UMVU) estimator of 𝜃 (or an estimator for 𝓰(𝜃) if it is unbiased and has the smallest variance
within the class of unbiased estimators of 𝜃 (or 𝓰(𝜃),) of all 𝜃 ∈ 𝛺. That is if T is any other
unbiased estimator of 𝜃, then-
Suppose we decide to restrict ourselves to the class of all unbiased estimators with finite variance.
The problem arises as to how we find an UMVU estimator, if such an estimator exists. For this we
would first determine a lower bound for the variances of all estimators (in the class of unbiased
estimators under consideration) and then would try to determine an unbiased estimator whose
variance is equal to this lower bound. The lower bound for the variances will be given by the
Cramer-Rao inequality for which we assume the following regularity conditions:
𝜕
(𝑖𝑖𝑖 ) 𝑓(𝑥; 𝜃) exists for all 𝜃 ∈ 𝛺
𝜕𝜃
9
∞ ∞
(𝑖𝑣) ∫−∞ ∫−∞ 𝑓(𝑥1 , 𝜃)𝑓(𝑥2 , 𝜃) … 𝑓 (𝑥𝑛 , 𝜃)𝑑𝑥1 , 𝑥2,….. 𝑑𝑥𝑛
∞ ∞
(𝑣) ∫ ∫ 𝑇 (𝑥1 , 𝑥2, … 𝑥𝑛 )𝑓(𝑥1 ; 𝜃) … 𝑓(𝑥𝑛 ; 𝜃)𝑑𝑥1 , 𝑥2,….. 𝑑𝑥𝑛
−∞ −∞
May be differentiated under the integral sign where T(X1, Xn) is any unbiased estimator of 𝜃
1
𝑉𝑎𝑟(𝑇) ≤ 2
𝜕
𝑛𝐸 [ 𝑙𝑜𝑔 𝑓(𝑥; 𝜃)]
𝜕𝜃
Proof: We have
∞
∫ 𝑓(𝑥𝑖 ; 𝜃) 𝑑𝑥𝑖 = 1 ; 𝑖 = 1,2. . 𝑛
−∞
∞
𝜕
∫ 𝑓(𝑥𝑖 , 𝜃) 𝑑𝑥𝑖 = 0
−∞ 𝜕𝜃
∞ 𝜕
Or ∫−∞ [𝜕𝜃 𝑙𝑜𝑔 𝑓 (𝑥𝑖 ; 𝜃)] 𝑓(𝑥𝑖 ; 𝜃)𝑑𝑥𝑖 = 0 … … . . (𝐴)
𝜕
Or 𝐸 [𝜕𝜃 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] = 0 … … . . (1)
∞ ∞
∞ ∞ 𝑛
𝜕
𝐸 (𝑇) = ∫ ∫ 𝑇(𝑥1 , . 𝑥𝑛 ) [∏ 𝑓(𝑥𝑖 , 𝜃)] 𝑑𝑥𝑖 … 𝑑𝑥𝑛 = 1 … … (2)
𝜕𝜃
−∞− ∞ 𝑖=1
But
𝑛 𝑛
𝜕 𝜕
∏ 𝑓(𝑥𝑖 ; 𝜃) = ∑ [ 𝑓 (𝑥𝑖 ; 𝜃) ∏ 𝑓 (𝑥𝑖 ; 𝜃)]
𝜕𝜃 𝜕𝜃
𝑖=1 𝑖=𝑖 𝑖=𝑖
𝑛 𝑛
1 𝜕
= ∑[ 𝑓 (𝑥𝑖 ; 𝜃) ∏ 𝑓 (𝑥𝑖 ; 𝜃)]
( )
𝑓 𝑥𝑖 ; 𝜃 𝜕𝜃
𝑖=𝑖 𝐼=𝑖
10
𝑛 𝑛
𝜕
= [∑ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] ∏ 𝑓(𝑥𝑖 ; 𝜃)
𝜕𝜃
𝑖=1 𝐼=𝑖
∞ ∞ 𝑛
𝜕
∫ ∫ 𝑇(𝑥1 , … 𝑥𝑛 ) [∑ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] 𝑓 (𝑥1 , 𝜃) … 𝑓(𝑥𝑛 , 𝜃)𝑑𝑥𝑖 … 𝑑𝑥𝑛 = 1
𝜕𝜃
−∞ −∞ 𝑖=1
Where
𝑛
𝜕
𝑍=∑ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)
𝜕𝜃
𝑖=1
𝑛
𝜕
𝐸 (𝑍) = ∑ 𝐸 [ 𝑙𝑜𝑔𝑓(𝑥𝑖 ; 𝜃)] = 0 … … . . (4)
𝜕𝜃
𝑖=1
And
𝑛 2
𝜕
𝑉𝑎𝑟(𝑧) = ∑ 𝐸 [ 𝑙𝑜𝑔𝑓(𝑥1; 𝜃)]
𝜕𝜃
𝑖=1
2
𝜕
= 𝑛𝐸 [ log 𝑓(𝑥1 ; 𝜃)] … . . (5)
𝜕𝜃
=1
1
(i)An unbiased estimator T whose variance equals the lower bound 𝜕 2
𝑛𝐸[ log 𝑓(𝑥,𝜃)]
𝜕𝜃
𝜕
If and only if T is if the from 𝑇 = 𝜃 + 𝑏𝜃𝑧 where 𝑧 = ∑𝑛𝑖=1 𝑙𝑜𝑔𝑓(𝑥, 𝜃)
𝜕𝜃
Proof:
1
𝑉(𝑇) = 2
𝜕
𝑛𝐸 [𝜕𝜃 log 𝑓 (𝑥, 𝜃)]
I𝓯𝓯
𝑅(𝑇, 𝑍) = 1
𝑇 = 𝑎𝜃 + 𝑏𝜃 𝑧
11
But 𝐸(𝑇) = 𝑎𝜃 = 𝜃
𝑖. 𝑒 T= 𝜃 + 𝑏𝜃 𝑧
1
𝑓 (𝑥, 𝜃) = ,0 ≤ 𝑥 ≤ 𝜃
𝜃
𝜕 1
𝑙𝑜𝑔𝑓(𝑥, 𝜃) =
𝜕𝜃 𝜃
2
𝜕 1
𝐸 [ log 𝑓 (𝑥, 𝜃)] = 2
𝜕𝜃 𝜃
𝜃2
CRB=
𝑛
𝑛+1
We know that 𝑇 = 𝑋(𝑛) is UMVUE whose variance is-
𝑛
𝜃2 𝜃2
𝑉 (𝑇) = <
𝑛(𝑛 + 2) 𝑛
𝐶𝑜𝑣(𝑇,𝑍) 1
Therefore, we have 𝑃 (𝑇, 𝑍) = =
𝑉(𝑇)𝑉(𝑍) 𝑉(𝑇)𝑉(𝑍)
1
𝑉(𝑇) ≥ 2
𝜕
𝑛𝐸 [𝜕𝜃 log 𝑓 (𝑥. 𝜃)]
(𝑖𝑖) If 𝓰(𝜃) is an estimable function for which an unbiased estimator is T (𝑖. 𝑒. 𝐸 (𝑇) = ℊ(𝜃))
then C.R Inequality becomes-
[ℊ(𝜃)]2
𝑉(𝑇) ≥ 2
𝜕
𝑛𝐸 [ log 𝑓 (𝑥. 𝜃)]
𝜕𝜃
𝜕 2 𝜕2
𝐸 [𝜕𝜃 log 𝑓(𝑥; 𝜃)] = −𝐸 [𝜕𝜃2 log 𝑓(𝑥; 𝜃)]
(𝑖𝑣) If an unbiased estimator exists which is such that its variance is equal to the lower bound
1
CRB= 𝜕 2 then it will be UMVUE.
𝑛𝐸[ log 𝑓(𝑥.𝜃)]
𝜕𝜃
(𝑣) If there is no unbiased estimator whose variance equals the C R B it does not mean that
UMVUE will not exist. Such estimators can be found (if these exists ) by other methods.
12
(𝑣𝑖) In case of distributions not satisfying the regularity conditions (e.g.: Rectangular distribution)
UMVU estimators, if these exists can be found by other methods. For such cases UMVU estimator
may have variance less than CRB.
1
Show that 𝑥̅ = 𝑛 ∑𝑛𝑖 𝑥𝑖 is a UMVU of 𝜃
𝜕 𝑥 1−𝑥
𝑙𝑜𝑔𝑓(𝑥, 𝜃) = −
𝜕𝜃 𝜃 1−𝜃
𝑥−𝜃
=
𝜃(1 − 𝜃)
So that
2
𝜕 𝐸(𝑥 − 𝜃)2
𝐸 [ log 𝑓 (𝑥. 𝜃)] = 2
𝜕𝜃 𝜃 (1 − 𝜃)2
𝜃(1−𝜃)
=
𝜃 2 (1−𝜃)2
1
=
𝜃(1−𝜃)
𝜃(1−𝜃)
By CR inequality we have C R B =
𝑛
𝜃(1−𝜃)
Now, 𝐸(𝑥̅ ) = 𝜃 and 𝑉𝑎𝑟 (𝑥̅ ) = 𝑛
that is equal to C R B. Hence 𝑥̅ is UMVUE of 𝜃
𝑚
𝑓 (𝑥, 𝜃) = ( ) 𝜃 𝑥 (1 − 𝜃)𝑚−𝑥 ; 𝑥 = 0,1, … , 𝑚(0 < 𝜃 < 1)
𝑥
𝑚
Soln: 𝑙𝑜𝑔𝑓(𝑥, 𝜃) = 𝑙𝑜𝑔 ( ) + 𝑥𝑙𝑜𝑔𝜃 + (𝑚 − 𝑥) log(1 − 𝜃)
𝑥
𝜕 𝑥 𝑚−𝑥
𝑙𝑜𝑔𝑓(𝑥, 𝜃) = +
𝜕𝜃 𝜃 1−𝜃
𝑥 − 𝑚𝜃
=
𝜃(1 − 𝜃)
𝜕 2 𝐸(𝑥−𝑚𝜃)2
So that 𝐸[ 𝑙𝑜𝑔𝑓(𝑥, 𝜃)] =
𝜕𝜃 𝜃 2 (1−𝜃)2
𝑚𝜃(1−𝜃)
=
𝜃 2 (1−𝜃)2
𝑚
= 𝜃(1−𝜃)
13
𝜃(1−𝜃)
For sample of one observation X let T=T(X) be an unbiased estimator. The C.R.B is 𝑚𝑛
. Now
𝑋̅ 𝑥 𝜃(1−𝜃) 𝑋̅
𝐸 (𝑚) = 𝜃 and 𝑉𝑎𝑟 (𝑚) = 𝑚𝑛
so that 𝑚 is UMVUE of 𝜃 (see left page)
𝑒 −𝜃 𝜃 𝑥
𝑓 (𝑥, 𝜃) = ; 𝑥 = 𝑜, 1 … … (𝜃 > 𝑂)
𝑥
𝜕 𝑥
𝑙𝑜𝑔𝑓(𝑥, 𝜃) = −𝐼 +
𝜕𝜃 𝜃
𝑥−𝜃
=
𝜃
2
𝜕 𝐸(𝑥, 𝜃)2
𝐸 [ 𝑙𝑜𝑔𝑓(𝑥, 𝜃)] =
𝜕𝜃 𝜃2
1
=
𝜃
𝜃
Now 𝐸 (𝑥̅ ) = 𝜃 and 𝑉𝑎𝑟(𝑥̅ ) = 𝑛 so that 𝑥̅ is UMVUE of 𝜃
1 (𝑥−𝜃)2
Soln: 𝑓(𝑥, 𝜃) = 𝑒− 2𝜎 2
√2𝑥𝜎
1 (𝑥 − 𝜃)2
log 𝑓(𝑥, 𝜃) = 𝑙𝑜𝑔 ( )−
√2𝑥𝜎 2𝜎 2
𝜕 (𝑥−𝜃)
Or 𝑙𝑜𝑔𝑓(𝑥, 𝜃) =
𝜕𝜃 𝜎2
2
𝜕 𝐸(𝑥 − 𝜃)2
𝐸 [ 𝑙𝑜𝑔𝑓(𝑥, 𝜃)] =
𝜕𝜃 𝜎4
1
=
𝜎2
2
The C.R.B= 𝜎 ⁄𝑛
2
Now 𝐸 (𝑥̅ ) = 𝜃 and 𝑉(𝑥̅ ) = 𝜎 ⁄𝑛 so that 𝑥̅ is UMVUE of 𝜃
Example Let 𝑥1 , . . , 𝑥𝑛 be a random sample from a normal distribution 𝑁(𝜇, 𝜃) where 𝜇 is known
and 𝜃 is that variance to be estimated. Show that 𝑠 2 = ∑𝑛𝑖(𝑥𝑖 − 𝜇)2 /𝑛 is UMVUE of 𝜃
(𝑥−𝜇)2
1
Soln: 𝑓 (𝑥; 𝜃) = 𝑒− 2𝜃
√2𝜋𝜃
1 1 (𝑥−𝜇)2
𝑙𝑜𝑔 𝑓 (𝑥; 𝜃) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔𝜃 −
√2𝑥 2 2𝜃
𝝏 1 (𝑥−𝜇)2
Or 𝝏𝜽
log f(x, 𝜃) = − 2𝜃 + 2𝜃 2
14
(𝑥 − 𝜇)2 − 𝜃
=
2𝜃 2
2
∂ 𝐸[(𝑥 − 𝜇)2 − 𝜃]2
𝐸 [ log 𝑓 (𝑥, 𝜃)] =
∂θ 4𝜃 4
3𝜃 2 − 2𝜃 2 + 𝜃 2
=
4𝜃 4
1
=
2𝜃 2
2
The C.R.B= 2𝜃 ⁄𝑛
∑𝑛
𝑖 (𝑥𝑖 −𝜇)
2 2𝜃 2
Consider the estimator 𝑆 2 = for which E(S2)= 𝜃 and V(S2)= so that S2 is UMVUE of 𝜃
𝑛 𝑛
Example An UMVU estimator is unique , in the sense that if TO and TI are both UMVU estimator
then TO = TI almost surely (𝑖. 𝑒 𝑃 (𝑇𝑂 ≠ 𝑇𝐼 ) = 0)
1
𝑇 = (𝑇𝑂 + 𝑇𝐼 )
2
1
𝑉(𝑇) = [𝑉(𝑇𝑂 ) + 𝑉(𝑇𝐼 ) + 2𝜌√𝑉(𝑇𝑂 )𝑉(𝑇𝐼 )]
4
𝐼+𝜌
𝑉(𝑇) = 𝑉(𝑇𝑂 )
2
By definition, 𝑉(𝑇) ≥ 𝑉(𝑇𝑂 ). It follows that 𝜌 ≥ 𝐼. Therefore 𝜌 =I so that, for every 𝜃, 𝑇𝑂 and 𝑇𝐼 are
linearly related, 𝑖. 𝑒.
𝑇𝑂 = 𝑎 + 𝑏𝑇𝐼
Where 𝑎, 𝑏 are amstants (may depend on 𝜃) and b≥ 0 . 𝑇aking expectation and variance we get
𝜃 = 𝑎 + 𝑏𝜃
}
𝑉(𝑇0 ) = 𝑏2 𝑉(𝑇𝐼 )
𝑇0 = 𝑇
CONSISTENCY
Definition: A sequence of estimator {𝑇𝑛 }. 𝑛 = 1,2, … of a parameter 𝜃 is said to be consistent if, as
n→∞
15
𝑇𝑛 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑠 𝑡𝑜 𝜃 𝑖𝑛 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦
Or 𝑃{|𝑇𝑛 − 𝜃| > 𝜖} → 0
Or 𝑃{|𝑇𝑛 − 𝜃| ≤ 𝜖} → 1
𝑎𝑠 𝑛 → ∞
Remarks:
(𝒊) For increase in sample size a consistent estimator will become more and more close to 𝜃
(iv) We will show later that if {𝑇𝑛 } is a sequence of estimators such that 𝐸(𝑇𝑛 ) → 𝜃 and 𝑉(𝑇𝑛 ) →
0 and 𝑛 → ∞ then {𝑇𝑛 } is consistent.
Examples:
1. Let (𝑥1 , … 𝑥𝑛 ) be a random sample from any distribution with finite mean 𝜃. Then it follows
from LLN that 𝑥̅ so that ̅̅̅̅̅
𝑥 → is consistent for 𝜃. If the distribution has finite variance
2
(𝜎 2 , 𝑠𝑎𝑦) 𝑉(𝑥̅ ) = 𝜎 ⁄𝑛 → 0 so that it follows from Remark (IV) that 𝑥̅ is consistent .it can be
shown that the sample median is also consistent for 𝜃
2. Suppose (𝑥1 , … , 𝑥𝑛 ) is a random sample from 𝑁(𝜇, 𝜎 2 ).
Let
𝒏
𝒙
̅ = ∑ 𝒊⁄𝒏
𝒙
𝟏
n
1
s = ∑(xi − x̅)2
2
n
1
n
2
1 n 2
s = ∑(xi − x̅)2 = s
(n − 1) n−1
1
Let (𝑥1 , … 𝑥𝑛 ) be a random sample from rectangular distribution. 𝑅 (𝑂, 𝜃) and let 𝑌𝑖 =
𝑚𝑖𝑛(𝑥1 , … 𝑥𝑛 ) consider the estimator 𝑇 = (𝑛 + 1)𝑌1 . This is unbiased . Now for a any 𝐸 (> 0),
𝜃 𝜀
𝑝 {[𝑌1 − ]≤ }
𝑛+1 𝑛+1
𝑛
=
𝜃𝑛
16
𝜃 𝜖
−
𝑛+1 𝑛+1
𝜃 𝜖
1 +
= 𝑛 [−𝜃 − 𝓎)𝑛 ]𝑛 + 1 𝑛 + 1
𝜃 𝜃 𝜖
−
𝑛+1 𝑛+1
𝑛𝑛 𝜖 𝑛 𝑒 𝑛
= [(1 + ) − (1 − ) ]
(𝑛 + 1)𝑛 𝑛𝜃 𝑛𝜃
𝜖 𝜖
(𝑒 𝜃 − 𝑒 −𝜃 )
𝑛→∞
𝑃{[𝑇 − 𝜃]𝜖} + 1
𝑛−1 2 2𝜎 4 (𝑛 − 1)
𝐸 (𝑠 2 ) = 𝜎 , 𝑉(𝑠 2 ) =
𝑛 𝑛2
2 2 2𝜎 4
𝐸(𝑠′ ) = 𝜎 2 , 𝑉(𝑠′ ) =
𝑛−1
2 2
By remark (iv) above 𝑠 2 + 𝑠′ are both constant for 𝜎 2 , 𝑠 2 is biased and 𝑠′ is unbiased.
𝜃 2
Soln: 𝐸(𝑋̅⁄þ) = 𝜃, 𝑉(𝑋̅⁄þ) = 𝑛þ → 0
𝐸 (𝑇𝑛 ) = 𝜃𝑛 → 𝜃
And 𝑉(𝑇𝑛 ) → 0
𝐸(𝑇𝑛 − 𝜃)2
𝑃{|𝑇𝑛 − 𝜃| > 𝜖} ≤
𝜖2
1
= 𝐸[(𝑇𝑛 − 𝜃𝑛 ) + (𝜃𝑛 − 𝜃)]2
𝜖2
1
= 𝐸[(𝑇𝑛 − 𝜃𝑛 )2 + (𝜃𝑛 − 𝜃)2 + 2(𝜃𝑛 − 𝜃)(𝑇𝑛 − 𝜃)]
𝜖2
17
1
= 𝑒 2 [𝑉(𝑇𝑛 ) + (𝜃𝑛 − 𝜃)2 ] →0
𝑃{|𝑇𝑛 − 𝜃| ≤ 𝜖1 } → 1
As n→∞
Also , since 𝓰(𝜃) is a continuous function , given 𝜖(> 0)we can choose 𝜖1 (> 𝑜)such that
Therefore ,
𝑃{|ℊ(Tn ) − ℊ(𝜃)| ≤ 𝜖} → 1
(ii) If {Tn } is consistent for 𝜃(R and non-negative) then √𝑇𝑛 is consistent for √𝜃.
𝜖
= 𝑃 {|√𝑇𝑛 − √𝜃)| ≥ }
√𝑇𝑛 + √𝜃
𝜖
≥ 𝑃 {|√𝑇𝑛 − √𝜃| ≥ }
√𝜃
(iii) If {Tn } is consistent for 𝜃 and {T′n } 𝑖𝑠 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑓𝑜𝑟 𝜃′, then {𝑇𝑛 ± 𝑇′𝑛 } is consistent for 𝜃 + 𝜃′.
≤ 𝑃{|𝑇𝑛 − 𝜃| + |𝑇 ′ 𝑛 − 𝜃′| ≥ 𝜖}
18
𝜖
≤ 𝑃 {|𝑇𝑛 − 𝜃| ≥ 2] 𝑈|𝑇 ′ 𝑛 − 𝜃′| ≥ 𝜖}
𝜖 𝜖
≤ 𝑃 {|𝑇𝑛 − 𝜃| ≥ 2} + 𝑃{|𝑇 ′ 𝑛 − 𝜃′| ≥ 2}→0
As n→∞.
(iv)if 𝑇𝑛 and 𝑇′𝑛 are consistent for 𝜃 and 𝜃’ respectively , 𝑇𝑛 𝑇′𝑛 is consistent for 𝜃𝜃′.
þ (𝜃 + 𝜃′)2 − (𝜃 − 𝜃′)2
→
4
EFFICIENCY:
If 𝑇1 and 𝑇2 are two unbiased estimators of a parameter 𝜃 , each having finite variance 𝑇1 is said to
be more efficient then 𝑇2 if 𝑉(𝑇1 ) >𝑉(𝑇2 ). The (relative) efficient of 𝑇1 relative to 𝑇2 is defined by
𝑉(𝑇 )
Eff(𝑇1 /𝑇2 )=𝑉(𝑇2 )
1
It is used to judge the efficiency of an unbiased estimator by comparing its variance with the
Cramer- Rao lower bound (C R B) .
Definition: Assume that the regularity condition of CR inequality hold (we call it a regular situation)
for family{𝑓(𝑥, 𝜃), 𝜃 ∈ 𝛺}. An unbiased estimator T* of 𝜃 is called most efficient if 𝑉(𝑇 ∗ ) equals the
CRB. In this situation, the ‘efficiency’ of any other unbiased estimator T of 𝜃 is defined by
𝑉(𝑇 ∗ )
Eff(T) =
V(T)
Remarks:
(𝑎) regular situation when there is no unbiased estimator whose variance equals the CRB
but an UMVUE exists and maybe found by other methods.
(b)Non-regular situations when an UMVUE exists and may be found by other methods
(ii)The UMVUE is ‘most efficient‘ estimator in the examples considered earlier all UMVUE, whose
variances equalled CRB are most efficient
19
Example Consider 𝑎, 𝑟, 𝑠(𝑥1 , … 𝑥𝑛 ) from a normal distribution𝑁(𝜇, 𝜃) where mean 𝜇 is known and
variance 𝜃(0 < 𝜃 < ∞ ) is to be estimated
1
We has seen that 𝑠 2 = 𝑛 ∑𝑛1(𝑥𝑖 − 𝜇)2 is UMVUE of 𝜃 for which the variance is equal to CRB and
1
consequently, 𝑠 2 is most efficient . Let 𝑠′2 = ∑𝑛1(𝑋𝑖 − ̅̅̅̅̅
𝑋̅)2
(𝑛−1)
2𝜃 2
Then 𝐸 (𝑆′2 ) = 𝜃 and 𝑉(𝑆′2 ) = 𝑛−1 so that the efficiency of 𝑠′2 is given by
2𝜃 2 /𝑛
Eff(s′2 ) =
2𝜃 2/(𝑛−1)
n−1
=
n
Asymptotic efficiency: As different from the above definition of efficiency we may define efficiency
Let us confine ourselves to consistent estimators which are asymptotically normally distributed.
Among this class, the estimator with the minimum asymptotic variance is called the ‘most efficient
estimator’. It is also called best asymptotically normal (BAN) or consistent asymptotically normal
efficient (CANE) estimator it we denote by avar(T*) the asymptotic variance of a BAN estimator T*
then the efficiency of any other estimator T (within the class of asymptotically normal estimators) is
defined by
𝑎𝑣𝑎𝑟(T∗)
𝐸𝑓𝑓(𝑇/T ∗)= 𝑎𝑣𝑎𝑟(𝑇)
Example: Let (𝑥1 , … , 𝑥𝑛 ) be a random sample from a normal distribution 𝑁(𝜇, 𝜎), Consider the
‘most efficient estimator 𝑥̅ and another estimator 𝑥̅ me. It can be show that both are CAN estimator.
We have
𝜎2
𝑉(𝑥̅ ) =
𝑛
𝜋 𝜎2
And 𝑉(𝑥̅ 𝑚𝑒 ) = 2 𝑛
2
Eff(𝑥̅ 𝑚𝑒 ) =
𝜋
Example: let T1 , T2 be two unbiased estimators of 𝜃, having the same variance. Show that the
correlation coefficient ρ between T1 , T2 cannot be smaller than 2e-1, where e is the efficiency of each
estimator,
𝑽(𝑻𝑶 )
𝑉 (𝑻𝟏 ) = 𝑉(𝑻𝟐 ) =
𝒆
𝑻𝟏 + 𝑻𝟐
𝑇=
2
1
Its variance is 𝑉 (𝑇) = [𝑉(𝑇1 ) + (𝑇𝟐 ) + 2𝜌√𝑉(𝑇1 )(𝑇2 )]
4
𝐼+𝜌
= 𝑉(𝑇𝑜 )
2𝑒
𝐼+𝜌
≥ 1 𝑜𝑟 𝜌 ≥ 2𝑒 − 1
2𝑒
Example: let 𝑇𝑜 be an UMVME (or most efficient estimator) where 𝑇1 an unbiased with efficiency ‘e’.
If 𝜌 is the correction coefficient between 𝑇𝑜 and𝑇1 , then show that 𝜌 =√𝑒.
Soln: we have
𝑒 = 𝑉(𝑇𝑜 )/𝑉(𝑇1 )
(Which the linear combination of 𝑇𝑜 , 𝑇1 with minimum variance) then T is also unbiased, having
variance
𝑉(𝑇𝑜 )
𝑉(𝑇) = [(1 + 𝜌2 𝑒 − 2𝜌√𝑒)𝑉(𝑇𝑜 ) + 𝑒(𝑒 + 𝜌2 − 2𝜌√𝑒)
𝑒
𝑉(𝑇𝑜 )
+2√𝑒(√𝑒 − 𝜌 − 𝜌𝑒 − 𝜌2 √𝑒)𝜌 [ ]
√𝑒
[1 − 2𝜌√𝑒 + 𝑒]2
(1 − 2𝜌√𝑒 + 𝑒)(1 − 𝑝 2 )
=
(1 − 2𝜌√𝑒 + 𝑒)2
(1−𝜌2)𝑉(𝑇𝑜 ) 1−𝜌2
Or 𝑉(𝑇) = = (1−𝜌2 )+( 𝑉(𝑇𝑜 )
1−2𝜌√𝑒+𝑒 √𝑒−𝜌)2
Since (1 − 𝜌2 ) and(√𝑒 − 𝜌 are both non-negative 𝑉(𝑇) ≤ 𝑉(𝑇𝑜 ) but since 𝑇𝑜 is UMVUE, 𝑉 (𝑇) 𝑉(𝑇𝑜 ).
therefore 𝑉(𝑇) = 𝑉(𝑇𝑜 ) , and 𝜌 = √𝑒
SUFFICIENCY CRITERION:
A preliminary choice among statistics for estimating 𝜃 , before having for a UMVUE as BAN
estimator, can be made on the basic of another enter on suggested by R.A fisher. This is called
‘sufficiency’ criterion.
Definition: let (𝒙𝟏 , … , 𝒙𝒏 ) be a random sample from the distribution of X having þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃)
𝜃 𝜖 𝛺.A statistic 𝑇 = 𝑇(𝒙𝟏 , … , 𝒙𝒏 ) is defined to be sufficient statistic if and only if the conditional
distribution of (𝒙𝟏 , … , 𝒙𝒏 ) given T=t does not depend on 𝜃, for any value t.
[Note: In such a case if we know the value of the sufficient statistic T, then the sample values are
not needed to tell us anything more about 𝜃].
Also the conditional distribution of any other statistic T (which is not for 𝛺 tray) given T is
independent of 𝜃.
21
A necessary and sufficient condition for T to be sufficient for 𝜃 is that the joint þ. 𝑑, 𝑓 of (𝒙𝟏 , … , 𝒙𝒏 )
should be of the form
Where the first term on 𝑟, ℎ, 𝑠., depends on T and 𝜃 and the second them is independent of 𝜃. 𝑇his is
known as Nyman’s Factorisation Theorem which provides a simple method of judging whether a
statistic T is sufficient
Remark: Any one to one function of a sufficient statistic is also a sufficient statistic
Example: Consider n Bernoulli trials with probability of success P. The associated Bernoulli
random variables (𝑥1 , … , 𝑥𝑛 ) have common distribution given by
= ℊ (∑ xi , p) (xi , xn )
𝑛 𝑛
Where ℊ(∑𝑛𝑖 𝑥𝑖 , 𝑝) = 𝑝 ∑𝑖 𝑥𝑖
(1 − 𝑝)𝑛−∑𝑖 𝑥𝑖
And 𝒽(𝑥1 , … , 𝑥𝑛 ) = 1
𝑒 −𝜆 𝜆𝑥
𝑓 (𝑥𝑖 , 𝜆) = , 𝑥 = 0,1 …
𝑥!
𝑒 −𝑛𝜆 𝜆𝜖𝑥𝑖
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜆) =
∏𝑛𝑖 𝑥𝑖 𝑖
= 𝓰 (∑ xi , 𝜆) 𝒽(xi , xn )
i
𝑛
Where 𝓰(∑ni xi , 𝜆) = 𝑒 −𝑛𝜆 𝜆∑𝑖 𝑥𝑖
1
𝒽(𝑥𝑖 , 𝑥𝑛 ) =
∏𝑛𝑖 𝑥𝑖 𝑖
Hence.
𝑛 𝑛
∑ 𝑥𝑖 𝑜𝑟 ∑ 𝑥𝑖 /𝑛
𝑖 𝑖
Example: let (𝑥1 , … , 𝑥𝑛 )be a random sample from a Normal population 𝑁(𝜇, 𝜎).
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜇) =
− ∑𝑛 {𝑥 −𝜇}2 /2𝜎 2
(𝜎𝑜 √2𝜋)𝑛𝑒 𝑖 𝑖 0
22
1
=
[∑𝑛 2 2 2
𝑖 𝑥1 +𝑛𝜇 −2𝑛𝑥𝜇]/2𝜎0
(𝜎𝑜 √2𝜋)𝑛−𝑒
= 𝓰(x̅, μ)𝒽(x1 , . . xn )
As
1
𝒽(𝑥𝑖 , . . 𝑥𝑛 ) = 𝑛𝑒
(𝜎𝑜 √2𝜋)
1 𝑛 2
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜎) = 𝑛𝑒 −∑𝑖 (𝑥𝑖−𝜇𝑜 ) /2𝜎02
(𝜎𝑜 √2𝜋)
= 𝓰 [∑(xi − μθ )2 , σ] 𝒽(𝑥𝑖 , 𝑥𝑛 )
i
Where
n
1 𝑛 2
𝓰 [∑(xi − μθ )2 , σ] = 𝑛𝑒 −∑𝑖 (𝑥𝑖−𝜇𝑜 ) /2𝜎02
i (𝜎𝑜 √2𝜋)
1 𝑛 2
𝑓(𝑥𝑖 , 𝑥𝑛 , 𝜇, 𝜎) = 𝑛𝑒 −∑𝑖 (𝑥𝑖−𝜇𝑜 ) /2𝜎02
(𝜎𝑜 √2𝜋)
1 𝑛
𝑥𝑖2−2𝜇 ∑𝑛 2
= 𝑛𝑒 −[∑𝑖 𝑖 𝑥𝑖 +2𝜇 ] 2𝜎02
(𝜎𝑜 √2𝜋)
Which shows that [∑𝑛𝑖 𝑥𝑖 , ∑𝑛𝑖 𝑥12 ] an jointly sufficient for [𝜇, 𝜎] Similarly,[ 𝑥̅ , ∑(𝑥𝑖 , 𝑥)2 / n-1]are also
sufficient for [𝜇, 𝜎],
1 𝑥 þ−1
𝑓(𝑥, 𝜃, þ) = 𝑒 −𝜃𝑥 ,𝑥≥𝜃
𝜃 þ (þ)
We have
𝑛 þ−1
1 𝑛
𝑓(𝑥𝑖 , 𝑥𝑛 , 𝜃, þ) = 𝑛þ 𝑒
−∑𝑖 𝑥𝑖 /𝜃 (∏ 𝑥𝑖 )
𝜃 (þ)𝑛
𝑖
We can write
23
𝑛
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = [ 𝑛þ −∑ 𝑥𝑖/𝜃 ] [∏ 𝑥𝑖 ] −þ−1
𝜃 ([(þ))𝑛𝑒
𝑖
We can write
𝑛 þ−1
1
𝑓((𝑥1 , … , 𝑥𝑛 ), þ) = [ 𝑛þ (∏ 𝑥𝑖 ) ] [𝑒 ∑ 𝑥𝑖/𝜃 ]
𝜃 ([(þ))𝑛𝑒
𝑖
Case III : Both 𝜃 and þ are unknown it is seen that (∑𝑛𝑖 𝑥𝑖 , ∏𝑛𝑖 𝑥𝑖 ) are jointly sufficient for (𝜃, þ)
1 −𝑥
𝑓(𝑥, 𝜃) = 𝑒 𝜃, 𝑥 ≥ 𝜃
𝜃
Example let (𝑥1 , … , 𝑥𝑛 )be a random sample from the distribution with þ, 𝑑, 𝑓
𝑓 (𝑥, 𝜃) = 𝜃𝑥 𝜃−1 , 𝜃 ≤ 𝑥 ≤ 1
We have
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = 𝜃 𝑛 (∏ 𝑥𝑖 )𝜃−1 = [𝜃 𝑛 (∏ 𝑥𝑖 )𝜃 ] [ ]
∏𝑛𝑖 𝑥𝑖
1
𝑓 (𝑥, 𝜃) = 𝑒 −[𝑥−𝜃] , ∞ < 𝑥 < ∞
2
We have
1 − ∑𝑛 [𝑥 −𝜃]
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = 𝑒 𝑖=1 𝑖
2𝑛
For no single statistics T it is possible to express the above in the form ℊ[T, θ]𝒽(𝑥𝑖 , 𝑥𝑛 , ) . Hence
there exists no statistic T which taken alone is sufficient for θ. However the whole set (𝑥1 , … , 𝑥𝑛 ) or
the set of order statistics (𝑥(1) , … , 𝑥(𝑛) )is jointly sufficient for θ
Example let (𝑥1 , … , 𝑥𝑛 ) be a random sample from the Rectangular distribution 𝑅(0, 𝜃)
having þ, 𝑑, 𝑓.
1
𝑓 (𝑥, 𝜃) = , −𝜃 ≤ 𝑥 ≤ 𝜃
𝜃
We have
𝑛
1
𝑓(𝑥𝑖 , 𝑥𝑛 , 𝜃) = 𝑛 ∏ 𝐼[𝜃,𝜃](𝑋𝐼)
𝜃
𝑖=1
1 𝑖𝑓 𝑥∈ 𝐴
𝐼𝐴(𝑥) = {
0 𝑖𝑓 𝑥 ∉ 𝐴
Where 𝑋(1) and 𝑥(𝑛) are the minimum and maximum of sample values (𝑥1 , … , 𝑥𝑛 )
1
Where ℊ[x(n) , 𝜃] = 𝜃𝑛 𝐼[x(n) ,𝜃] (𝑥(𝑛) )
Example : If x has þ, 𝑑, 𝑓
1
𝑓(𝑥, 𝜃) = ; −𝜃 ≤ 𝑥 ≤ 𝜃
𝜃
1
𝑓 (𝑥𝑖 , 𝑥𝑛 , 𝜃) = 𝐼 (𝑛 )𝐼 𝑋
𝜃 𝑛 [−𝜃𝑋(𝑛) ] (𝑛) [𝑋(1),𝑂] (𝑛)
Example Let (𝑥1 , … , 𝑥𝑛 ) be a random sample from the rectangular distribution 𝑅(𝜃1 , 𝜃2 ) having
þ, 𝑑, 𝑓
1 𝜃1 ≤ 𝑥𝑖 ≤ 𝜃2
𝑓 (𝑥, 𝜃1 , 𝜃2 ) = {𝜃2 −𝜃1
𝑜
𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝑛
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃1 , 𝜃2 ) = ∏ 𝐼[𝜃1 ,𝜃2](𝑋𝐼)
(𝜃2 − 𝜃1 )𝑛
𝑖=1
1 𝑖𝑓𝜃1 ≤ 𝑥𝑖 ≤ 𝜃2
Where 𝐼[𝜃𝑖𝜃 ](𝑥𝑖) = {
𝑖 0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
We can write
𝑛
Where
And 𝓱((𝑥1 , … , 𝑥𝑛 )) = 1
25
1
𝑓 (𝑥, 𝜃) = , −𝜃 ≤ 𝑥 ≤ 𝜃
2𝜃
Then
𝑛
1
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜃) = ∏ 𝐼[−𝜃,𝜃](𝑋𝑖)
(2𝜃)𝑛
𝑖=1
1
= 𝐼 (𝑥 )𝐼 𝑥
(2𝜃)𝑛 [−𝜃,,𝑥(𝑛) (𝑖) [𝑋(𝑛),𝜃] (𝑛)
1 1
Example: [𝑥(1) , … , 𝑥(𝑛) ] are jointly sufficient for𝜃 in 𝑅(𝜃 − , 𝜃 + ) and 𝑅(𝜃, 𝜃 + 1)
2 2
𝑛 𝑛
𝑓((𝑥1 , … , 𝑥𝑛 ), 𝜆) = 𝜆𝑛−𝜆
𝑒 ∑(𝑥𝑖 − 𝜃) ∏ 𝐼[𝜃,∞)(𝑥𝑖 )
𝑖 𝑖=1
𝑛 𝑛
∏ 𝑖𝑥 (𝑖), 𝑥)(𝑥(𝑖) )
𝑛
= {𝑒 𝜆𝑛𝜃 𝐼[𝜃,∞)(𝑋(𝑖) ) } {𝑥𝑜𝑒 𝑛−𝜆𝜃 ∑𝑖 𝑥𝑖
𝐼[𝑥,(𝑖)∞)(𝑋(𝑖)) }
METHHODS OF ESTIMATION:
26
For important methods of obtaining estimators are (I) methods of moments,(II) methods of
maximum likelihood (III)method of minimum χ2 and (IV) method of least squares.
(I)Method of moments
Suppose the distribution of a random variable X has K parameters (𝜃1 , 𝜃2 , … … . . 𝜃𝑘 ) which have to
be estimated. let 𝜇𝑟 = 𝐸(𝑥 𝑟 ) denote the 𝑟 𝑡ℎ moment of about 𝑂. in general 𝜇′𝑟 is a known
function of 𝜃1 , … , 𝜃𝑘 so that = 𝜇𝑟 (𝜃1 , … 𝜃𝑘 )Let (𝑥1 , … 𝑥𝑛 ) be a random sample from the distribution
of X and let 𝑚𝑟 = ∑𝑛𝑖 𝑥𝑖𝑟 /𝑛 be the 𝑟 𝑡ℎ . Sample moment from the equation
Whose solution is say 𝜃̂1 … 𝜃̂𝑘 , where 𝜃̂𝑖 is the estimate of 𝜃𝑖 (𝑖 = 1, . . 𝑘) Those are the method of
moments estimators of the parameters.
𝜇′1 = 𝜇
𝜇′2 = 𝜎 2 + 𝜇2
The equation
𝑥̅ = 𝜇
∑ 𝑥𝑖2
= 𝜎 2 + 𝜇2
𝑛
Let
𝜇 = 𝑥̅
∑𝑛 2
𝑖 𝑥𝑖 ∑𝑛 ̅̅̅2
𝑖 (𝑥𝑖 −𝑥)
𝜎=√ 𝑛
− 𝑥̅ 2 =√ 𝑛
Example let 𝑥 ~P (λ) and let (𝑥1 , . . , 𝑥𝑛 ) be random sample from P(λ).
𝜇′1 = λ
The equation
𝑥̅ = λ
𝑓(𝑥, 𝜃) = 𝜃𝑒 −𝜃𝑥 , 𝑥 ⩾ 𝜃
1
𝜇′1 =
𝜃
27
1
𝑥̅ =
𝜃
1
𝜃̂ =
𝑥̅
Remark: (I) the method of moments estimators are not uniquely defined. We may equate the
(II) These estimator are not, in general, consistent and efficient but will be so only if the parent
(III) When population moments do not exist (𝑒. 𝑔. 𝐶auchy population) this method of estimation is
inapplicable.
given, 𝑓(𝑥1 , . . , 𝑥𝑛 , . . 𝜃) may be looked upon as a function of 𝜃 which is called the likelihood function
of 𝜃 and is denoted by 𝐿(𝜃) = 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) it gives the likelihood that the 𝑟, 𝑣. (𝑥1 , . . , 𝑥𝑛 )assumes
We want to know from which distribution (𝑖. 𝑒. for what value of 𝜃) is the likelihood largest for this
set of observations. In other words we want to find the value of 𝜃, denoted by 𝜃̂ which
maximizes 𝐿(𝑥1 , . . , 𝑥𝑛 , 𝜃). The value 𝜃̂ maximizes the likelihood function is in general, a function of
𝑥1 , . . , 𝑥𝑛 say
𝜃̂ = 𝜃̂(𝑥1 , . . , 𝑥𝑛 )
In many cases it would be more convenient to deal with log 𝐿(𝜃), rather then 𝐿(𝜃), since log 𝐿 (𝜃)
is maximized for the some value of 𝜃 as 𝐿(𝜃). For obtaining 𝑚. ℓ. 𝑒 we find the value of 𝜃 for which
28
𝜕
log 𝐿(𝜃) = 0 … … … (1)
𝜕𝜃
We must however, check that this provides the absolute maximum. It the derivate dose not exists at
𝜃 = 𝜃̂ or equation (1) is not solvable this method of solving (1) will fail.
𝑓(𝑥, 𝜃) = 𝜃 𝑥 (1 − 𝜃)𝑖−𝑥 , 𝑥 = 𝜃, 1
𝑛 𝑛
Then the likelihood 𝐿 (𝜃, 𝑥1 , . . , 𝑥𝑛 ) = 𝜃 ∑𝑖 𝑥𝑖
(𝑖 − 𝜃)𝑛−∑𝑖 𝑥𝑖
𝜕
log 𝐿(𝜃) = 0
𝜕𝜃
∑𝑛
𝑖 𝑥𝑖 𝑛−∑𝑛
𝑖 𝑥𝑖
𝑖, 𝑒 𝑒
− 𝑖−𝜃
=0
∑𝑛
𝑖 𝑥𝑖 −𝑛𝜃
Or 𝜃(𝑖−𝜃)
=0
Or 𝑒 = ∑𝑛𝑖 𝑥𝑖 /𝑛=𝑥̅
𝑚. ℓ. 𝑒 of 𝜃 is 𝜃̂ = 𝑥̅
𝑒 −𝜆 𝜆𝑥
𝑓 (𝑥, 𝜆) = , 𝑥 = 0,1,2, ….
𝑥!
∑𝑛
𝑒 −𝑛𝑥 𝜆 𝑖 𝑥𝑖
Then 𝐿(𝜆, 𝑥1 , . . , 𝑥𝑛 ) = ∏𝑛
𝑖 𝑥𝑖 !
𝜕 ∑𝑛
𝑖 𝑥𝑖
Or 𝜕𝜃
log 𝐿 (𝜆) = −𝑛 + 𝜆
𝑚. ℓ. 𝑒 of 𝜆 is 𝜆̂ = 𝑥̅
𝑥 2−𝑥
2 𝜃 (𝑖 − 𝜃)
𝑓 (𝑥, 𝜃) = ( ) , 𝑥 = 𝑖, 1,2(𝜃 < 𝜃 < 1
𝑥 𝑖 − [𝑖 − 𝜃]2
2 𝜃2𝑥𝑖 (1−𝜃)2𝑛−2𝑥𝑖
Then 𝐿 (𝜃, 𝑥𝑖 , 𝑥𝑛 ) = ∏𝑛𝑖 ( ) [𝑖−(𝑖−𝜃)2𝑛 ]
𝑥𝑖
2
And log 𝐿(𝜃) = ∑𝑛𝑖 𝑙𝑜𝑔 ( ) + (∑ 𝑥𝑖 ) log 𝜃 + (2𝑛 − 2𝑥𝑖 ) log(1 − 𝜃) − 𝑛𝑙𝑜𝑔[ 1 − (1 − 𝜃)2 ]
𝑥𝑖
𝜕 ∑𝑛𝑖 𝑥𝑖 ∑𝑛𝑖 𝑥𝑖 − 2𝑛 2𝑛 (1 − 𝜃)
log 𝐿 (𝜃) = + +
𝜕𝜃 𝜃 1−𝜃 1 − (1 − 𝜃)2
−2𝑛𝜃 (1 − 𝜃)2 ] = 𝜃
Or ∑ 𝑥𝑖 [1 − (1 − 𝜃)2 ] = 2𝑛𝜃
2
Or 2−𝜃 =
𝜋
2
Or 𝜃 = 2−𝜋
2
𝑚. ℓ. 𝑒 is 𝜃 = 2 − 𝜋
1 ∑𝑛 2
𝑖 (𝑥𝑖 −𝜇𝜃) /2𝜎 2
Then 𝐿 (𝜇, 𝑥1 , . . , 𝑥𝑛 ) = 𝑛 𝑒−
(𝜎√2𝜋)
𝜕 ∑𝑁
𝐼 (𝑋𝐼 −𝜇)
Or log 𝐿(𝜇) =
𝜕𝜃 σ2θ
𝑚. ℓ. 𝑒 Of 𝜇̂ = 𝑥̅
1 ∑𝑛 2
𝑖 (𝑥𝑖 −𝜇𝜃) /2𝜎 2
Then 𝐿(𝜎, 𝑥1 , . . , 𝑥𝑛 ) = 𝑛 𝑒−
(𝜎√2𝜋)
∑ 𝑛 2
𝑛 − 𝑖 (𝑥𝑖−𝜇𝜃)
And log 𝐿 (𝜎) = −𝑛𝑙𝑜𝑔𝜎 − 2 log(2𝜋) − 2𝜎 2
𝜕 𝑛 ∑𝑛
𝑖 (𝑥𝑖 −𝜇𝜃)
2
Or 𝜕𝜎
𝑙𝑜𝑔𝐿(𝜎) = − 𝜎 + 𝜎3
∑𝑛
𝑖 (𝑥𝑖 −𝜇𝜃)
2
Equating to zero we get 𝜎=√ 𝑛
∑𝑛
𝑖 (𝑥𝑖 −𝜇𝜃)
2
𝑚. ℓ. 𝑒 Of 𝜎 is 𝜎̂ = √ 𝑛
1 ∑𝑛 2
𝑖 (𝑥𝑖 −𝜇𝜃) /2𝜎 2
Then 𝐿 (𝜇, 𝜎, 𝑥1 , . . , 𝑥𝑛 ) = 𝑛 𝑒−
(𝜎√2𝜋)
𝑛 𝑛 ∑(𝑥𝑖−𝜇)2
And 𝑙𝑜𝑔𝐿(𝜇, 𝜎) = − 2 𝑙𝑜𝑔𝜎 − 2 log(2𝜋) − 2𝜎 2
𝜕 ∑(𝑥𝑖 − 𝜇)
𝑙𝑜𝑔𝐿(𝜇, 𝜎) =
𝜕𝜇 2𝜎 2
𝜕 𝑛 ∑(𝑥𝑖−𝜇)2
And 𝜕𝜎
𝑙𝑜𝑔𝐿(𝜇, 𝜎) = 𝜎 + 𝜎3
∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
Equating to zero both the derivatives and solving the equations we get 𝜇 = 𝑥̅ and 𝜎 = √
𝑛
∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
𝑚. ℓ. 𝑒 are 𝜇̂ = 𝑥̅ and 𝜎̂ = √
𝑛
31
1 −𝑥
𝑓(𝑥, 𝜃) = 𝑒 𝜃, 𝑥 ⩾ 𝜃
𝜃
1 𝑥
Then 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) = 𝜃𝑒 𝑒 − ∑𝑖 𝑥𝑖/𝜃
𝜕 𝑛 ∑𝑛𝑖 𝑥𝑖
𝑙𝑜𝑔𝐿(𝜃) = − + 2
𝜕𝜃 𝜃 𝜃
𝑓 (𝑥, 𝜃) = 𝑒 −(𝑥−𝜃) , 𝑥 ⩾ 𝜃
If we differentiate 𝑙𝑜𝑔𝐿(𝜃) 𝑤, 𝑟, 𝑡 𝜃 and equate to zero we get 𝑛 = 𝜃 which does not yield any result.
Now 𝐿(𝜃) is maximized by choosing the maximum value of 𝜃 subject to the condition
Example: X has 𝑝. 𝑑. 𝑓
𝑓 (𝑥, 𝜆, 𝜃) = 𝜆𝑒 −𝜆(𝑥−𝜃) , 𝑥 ⩾ 𝜃
𝑚. ℓ. 𝑒𝜃̂ = 𝑥(𝑖)
1
𝜆=
𝑥̅ − 𝑥(1)
𝜃−1
Then 𝐿 (𝜃, 𝑥1 , . . , 𝑥𝑛 ) = 𝜃 𝑛 (∏𝑛𝑖 𝑥𝑖 )
32
𝜕 𝑛
Or 𝜕𝜃
𝑙𝑜𝑔𝐿(𝜃) = 𝜃 + ∑𝑛𝑖 𝑙𝑜𝑔𝑥𝑖
𝑛
Equating to zero we get 𝜃 = − ∑ log 𝑥
𝑖
𝑛
𝑚. ℓ. 𝑒 of 𝜃̂ = ∑𝑛
𝑖 𝑙𝑜𝑔𝑥𝑖
1
𝑓(𝑥, 𝜃) = ,0 ≤ 𝑋 ≤ 𝜃
𝜃
1
Then 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) = , 0 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃
𝜃𝑛
0 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃
𝑚. ℓ. 𝑒 of 𝜃 is𝜃̂ = 𝑥(𝑖)
1
𝑓(𝑥, 𝜃) = ,0 ≤ 𝑋 ≤ 𝜃
2𝜃
1
Then 𝐿(𝜃, 𝑥1 , . . , 𝑥𝑛 ) = , −𝜃 ≤ 𝑥(1) ≤ ⋯ ≤ 𝑥(𝑛) ≤ 𝜃
2𝑛 𝜃 𝑛
𝑚. ℓ. 𝑒 of 𝜃 is 𝜃̂ = −𝑥(𝑖)
1
𝑓 (𝑥, 𝜃1 , 𝜃2 ) = , 𝜃 ⪕ 𝑥 ⪕ 𝜃2
𝜃2 − 𝜃1 1
1
Then 𝐿 (𝜃1 , 𝜃2 , 𝑥𝑖 𝑥𝑛 ) = (𝜃 𝑛 , 𝜃1 ⪕ 𝑥(𝑖) ⪕ ⋯ 𝑥(𝑛) ⪕ 𝜃2
2 𝜃1)
In maximized when (𝜃2 − 𝜃1 ) is minimum 𝑖, 𝑒𝜃1 is maximum and 𝜃2 is minimum subject to the
condition
𝜃1 ⪕ 𝑥(𝑖) ⪕. ⪕ 𝑥(𝑛) ⪕ 𝜃2
We have to take 𝜃2 = 𝑥(𝑛) and 𝜃1 = 𝑥(𝑖) so that 𝑚. ℓ. 𝑒 𝑜𝑓𝜃1 and 𝜃2 are 𝜃̂1 = 𝑥(𝑖) and𝜃̂2 = 𝑥(𝑛)
1
𝑓(𝑥, 𝜃) = ,𝜃−𝑐 ⪁ 𝑥 ⪕ 𝜃+𝑐
(2𝑐)
1
Then 𝐿 (𝜃, 𝑥𝑖 , 𝑥𝑛 ) = , 𝜃, 𝑐 ⪁ 𝑥(𝑖) ⪁. ⪁ 𝑥(𝑛) ⪁ 𝜃 + 𝑐is maxi zed for any 𝜃such that
(2𝑐)𝑛
𝜃 − 𝑐 ⪁ 𝑥(𝑖) ⪁ ⋯ ⪕ 𝑥(𝑛) ⪕ 𝜃 + 𝑐
𝑥(𝑖) +𝑥(𝑛)
This shows that any statistics which lies in between 𝑥(𝑛) − 𝑐 and 𝑥(𝑖) + 𝑐, 𝑒. 𝑔 is 𝑎, 𝑚. 𝑙. 𝑒 the
2
Example 12 It x has 𝑅(𝜃, 𝜃 + 𝐼),any statistics which lies between 𝑥(𝑛) − 1 and 𝑥(𝑖) is a 𝑚. 𝑙. 𝑒 if 𝜃
1
𝑓 (𝑥, 𝜃) = 𝜃 ⪁ 𝑥 ⪁ 2𝜃
𝜃
1
Then 𝐿 (𝜃, 𝑥𝑖 , . . 𝑥𝑛 ) = 𝜃𝑛 , 𝜃 ⪁ 𝑥(𝑖) ⪁ ⋯ ⪁ 𝑥(𝑛) ⪁ 2𝜃
𝑖. 𝑒 𝜃 ⪁ 𝑥(1) … … (𝑖)
𝑥(𝑛) 2𝜃
Since 𝑥(𝑖)
⪁ 𝜃
=2
𝑥(𝑛)
𝑖. 𝑒 2
⪁ 𝑥(𝑖)
𝑥(𝑛)
The minimum value of 𝜃 satisfying (𝑖 ), (𝑖𝑖) is so that the 𝑚. 𝑙. 𝑒of 𝜃 is
2
𝑥(𝑛)
𝜃̂ =
2
1
𝑓 (𝑥, 𝜃) = , −𝜃 ⪁ 𝑥 ⪁ 𝜃
2𝜃
Then
1
𝐿(𝜃, 𝑥𝑖 , . . 𝑥𝑛 ) = , 𝜃 ⪁ 𝑥(1) ⪁. . ⪁ 𝑥(𝑛) ⪁ 𝜃
(2𝜃)𝑛
𝑚, 𝑙, 𝑒 of 𝜃̂ = max(−𝑋(1) , 𝑋(𝑛) )
1
𝑓 (𝑥, 𝜃) = 𝑒 −[𝑥−𝜃] , −∞ < 𝑥 < ∞
2
Then
1 𝑛
𝐿(𝜃, 𝑥𝑖 , 𝑥𝑛 ) = 2 𝑒 − ∑𝑖 [𝑥−𝜃]
𝑚, 𝑙, 𝑒 of 𝜃 is 𝜃̂ = 𝑥̂𝑚𝑒
𝑛
𝐼 −
𝐼
(𝑥 −𝑟𝜃)2
𝐿(𝜃, 𝑥𝑖 , 𝑥𝑛 ) = ∏ ⌈ 𝑒 2𝑟3 𝜎 2 𝑟
𝑟=𝑖
√2𝜋𝑟 3 𝜎 2
1 𝐼 1 (𝑥 −𝑟𝜃)2
∑𝑛 𝑟 3
=( ) −2𝜎 2 𝑟1 𝑟
2𝜋𝜎 (𝑛𝑖 )2𝑒
1 3 1 (𝑥𝜎 −𝑟𝜃)2
And log 𝐿(𝜃) = 𝑛𝑙𝑜𝑔 ( ) − 𝑙𝑜𝑔𝑛1 − ∑
2𝜋𝜎 2 2𝜎 2 𝑟3
𝜕 1 (𝑥𝑟 −𝑟𝜃 )
Or 𝜕𝜃
𝑙𝑜𝑔𝐿(𝜃) = 2𝜎 2 ∑𝑛𝑟=1 𝑟2
∑𝑛
𝑖 𝑥𝑟 /𝑟
2
Or 𝜃= ∑𝑛 2
𝑖 𝑖/𝑟
𝑚. 𝑙. 𝑒 Of 𝜃 is
∑𝑛𝑖 𝑥𝑟 /𝑟 2
𝜃̂ =
∑𝑛𝑖 𝑖/𝑟 2
𝜎 2
We have 𝐸(𝜃̂) = 𝜃, 𝑉(𝜃̂) = ∑𝑛(𝐼⁄
𝐼 𝑟)
Optimum properties of MLE: (i) If 𝜃̂ is 𝑚. 𝑙. 𝑒 of 𝜃 and Ψ (𝜃) is a simple valued function of 𝜃 with
(iii) Suppose 𝑓(𝑥, 𝜃) statistics certain regularity conditions and 𝜃̂𝑛 = 𝜃̂𝑛 (𝑥1 , . . . 𝑥𝑛 ) is the 𝑚. 𝑙. 𝑒 of a
(c)The sequence of estimators 𝜃̂𝑛 has the smallest asymptotic variance among all consistent,
estimated 𝜃 = (𝜃1 , … . . 𝜃𝑟 ) Suppose S1, S2....S𝓀 are 𝒽 mutually exclusive classes which from a
Where ∑𝓀
𝑗=1 þ, ( 𝜃) = 1
Suppose ,in practice ,corresponding to a random sample of n observations from the distribution of
X we are given the frequencies (𝑁1 , … . . , 𝑁𝓀 ) where 𝑁𝑗 =observed number of sample observations
Where 𝑛𝑗 is the observed value of 𝑁𝑗 (𝑗 = 1,2 … . 𝓀) Evidently 𝑥 2 will be a function of 𝜃(𝑜𝑟 𝜃𝑖 , . . 𝜃𝑟 )to
obtain the estimator of 𝜃we minimise 𝑥 2 𝑤. 𝑟, 𝑡 𝜃. The minimise 𝑥 2 estimator of 𝜃 is that 𝜃̂ which
The equation (s) for determining the estimator(s) by this me that are
𝜕χ2 𝜕χ2
= 𝜃 𝑜𝑟 = 0 (𝑖 = 1, … … 𝑟)
𝜕𝜃 𝜕𝜃
Remarks:
(If 𝑛𝑗 = 0 , unity is used). The modified minimum χ2 estimator of 𝜃 is 𝜃̂ which minimises the
modified χ2
(ii) For large n, the minimum χ2 and likelihood equations are identical and, consequently, provide
(iii) The minimum χ2 estimators are consistent asymptotically normal and efficient .
Take 𝑁𝑗 =the number of observations equal to j for 𝒿= 0,1 Here the range of X is pinioned into the
þ0 (𝜃) = 𝑃 (𝑥 = 0) = 1 − 𝜃
}
þ1 (𝜃)𝑃(𝑥 = 1) = 𝜃
And
𝑖
[𝑛𝑗 − 𝑛þ𝑗 (𝜃)]2
χ2 = ∑
𝑛þ𝑗 (𝜃)
𝑗=𝑜
[𝑛𝑖 − 𝑛𝜃 ]2 1
=
𝑛 𝜃(1 − 𝜃)
(IV) METHOD OF LEAST SQUARES Suppose γ is e random variable whose value depends
on the value of a (non-random) variable𝑥. For example the weight of a baby (Y) depends on its
age(𝑥) , the temperature (Y) of a place at a given time depends on its altitude (𝑥), or the salary (Y)
of an individual at a given age depends on the number of years (𝑥) of formal education which he has
had the maintenance cost (y) per year of an automobile depends on its age (𝑥) etc.
38
We assume that the distribution of the 𝑟. 𝑣 𝑌 is such that for a given 𝑥, 𝐸(Y/x) is a linear function of
𝑥 while the variance and higher moments of γ are independent of 𝑥. It means that we assume the
liner model
𝐸(Y/𝑥)=𝛼 + 𝛽𝑥
Y = 𝛼 + 𝛽𝑥 + 𝜖
The problem is to estimate the parameters 𝒹 and 𝛽 on the basic of a random sample of n
The method of least squares estimations of α and 𝛽 specifies that we should take as our estimates of
∑[𝓎𝑖 − α − xi ]2
𝑖=1
Where 𝓎𝑖 is the observed value of yi and xi are the associated values of x. This we minimise the sum
𝑛
∑ (𝓎 −𝓎̅)(𝑥 −𝑥̅ )
Are 𝛽̂ = 𝑖=1∑𝑛 𝑖(𝑥 −𝑥̅ )𝑖2
𝑖=1 𝑖
And ̂ = 𝓎̅ − ̂𝛽𝑥̅
α
Remarks:
The least square estimator do not have any optimum properties ever asymptotically However in
linear estimation this method provides good estimation in small simples. These estimators are
TESTING OF HYPOTHESIS
general, in the n-dimensional real space 𝑅𝑛 the parameter space (all possible values of the
Definition: A statistical hypothesis is a statement about the parameter 𝜃in the form 𝐻: 𝜃 𝜖 ω(< Ω)
𝐻: 𝜃 = 𝜃𝑂
𝑜𝑟 𝐻: 𝜃 ⩾ 𝜃0 𝑜𝑟 𝐻: 𝜃 ≠ 𝜃0 𝐻: 𝜃1 < 𝜃 < 𝜃2
Definition If a hypotheses specifics an exact value of the parameter𝜃, it is called a simple hypothesis
If a hypothesis does not fully specify the value of 𝜃( but gives a set of possible values only) it is
Definition the hypothesis which is being actually tested is called the null hypothesis and other
hypothesis which is stated as the alternative to the null hypothesis is called alternative hypothesis.
For example, null hypothesis may be 𝐻𝑜 : 𝜃 = 𝜃0 and the alternative may be 𝐻𝑖 : 𝜃 ≠ 𝜃0 or 𝐻𝑖 : 𝜃 >
𝜃0 or 𝐻𝑖 : 𝜃 ≤ 𝜃0 etc.
Both null and alternative hypothesis may be simple or composite .For our study, we shall usually
Suppose we want to test a null hypothesis 𝐻𝑂 against an alternative hypothesis 𝐻1 on the basis of a
random sample 𝐸 = (𝑋1 , . . 𝑋𝑛 ) in the sense that we have to decide when to reject or accept 𝐻𝑂
40
or procedure for deciding when to reject or accept 𝐻𝑂 on the basis of the sample 𝐸 = (𝑋𝐼 , . . 𝑋𝑛 ) .It
̅ = 𝑅𝑛 − 𝑊 such that
specifies a position of the sample space 𝑅𝑛 into two disjoint subsets W and 𝑊
Definition The set W, corresponding to a test T, which is that we reject 𝐻𝑂 when 𝐸 𝜖 𝑊 is called the
Two types of errors: In a testing problem we are liable to commit two types of error. Suppose 𝐻𝑂 is
reject the null hypothesis when it is actually true. On the other hand, suppose 𝐻𝑂 is false and 𝐻𝑖 is
accept the null hypothesis when it is actually false. We denote by α and 𝛽 the probability of type I
= 𝑃{𝐸 𝜖 𝑊/ 𝜃 𝜖𝐻𝑜 }
̅ / 𝜃 𝜖𝐻𝑜 }
= 𝑃{𝐸𝜖𝑊
Definition The probability of type I error for a test T, denoted by ∝ is called the “size” or level of
Remark If 𝐻𝑜 is simple (say 𝐻𝑜 :𝜃=𝜃𝑜 ) is clearly defined ,when 𝐻𝑜 is composite (say 𝐻𝑜 : 𝜃 𝜖 𝑊)we
take
Definition For a test T having the co region 𝑤2 the power function 𝑃𝑇 () is defind by
𝑃𝑇 (𝜃) = 𝑃{𝑅𝑒𝑗𝑒𝑐𝑡𝐻𝑜 / 𝜃}
= P𝜃 {E 𝜖 𝑊}
41
As a function of 𝜃
Evidently,
𝑃𝑇 (𝜃) = 𝛼 𝑓𝑜𝑟 𝜃 𝜖 𝐻𝑜
𝑃𝑇 (𝜃) = 1 − 𝛽 𝜃 𝜖 𝐻1
If we would find a test of the given hypothesis for which both ∝and 𝛽are minimum it would be the
best. Unfortunately, it is not possible to minimise both error simultaneously for a fixed sample size
𝑇𝐼 always rejects HO 𝑖, 𝑒 its critical region W1 =𝑅𝑛 , while 𝑇2 always accepts 𝐻𝑜 𝑖, 𝑒 its cur region
𝑊2 = ∅ then for T1 , ∝= 0 and 𝛽 = 1 this shows that if the probability of type I error becomes
minimum than the probability of type II error becomes maximum and vice-versa what is done is to
fix ∝ , taking ∝ to be quite small (in practical ∝= .05 or .01)so that all test of size ∝ are only
considered. Among all test of a given size∝ comparison made on the basic of their power function. If
T and T are two tests (for the same testing problem) of same size ∝, T is said to be better than T if
its power is greater than the power of T for alteration hypothesis (equivalently the probability of
type II error for T is less then the probability of type II for T,)
𝐻𝑂 : 𝜃 = 𝜃𝑂
𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃𝑂 )
Definition A test T* is called a most powerful test (MP) of size ∝ (0 <∝< 1) if only if the probability
of type I error is equal to ∝ and its power 𝑃𝑇 (𝜃) is not less than the power 𝑃𝑇 (𝜃) of all other test T
of size ∝, 𝑖, 𝑒
(𝑖)𝑃𝑇∗ (𝜃𝑜 ) =∝
[This means that the probability of type II error for T is less that of IV any other test]
𝐻𝑜 : 𝜃 = 𝜃0
Definition: A test T is called a uniformly most powerful test (VMP) of size ∝( 0 <∝< 1)if its
probability type I error is equal to ∝ and its power function is such that
𝑃𝑇𝑥 (𝜃) ⩾ 𝑃𝑇 (𝜃)for all 𝜃𝜖𝐻𝐼 and all other test T of size ∝
𝑓 (𝑥, 𝜃) = 𝜃 𝑒 −𝜃𝑥 (𝑥 ⩾ 𝜃)
𝐻0 : 𝜃 = 2
Against }
𝐻1 : 𝜃 = 1
Let the sample consist of only one observation X and consider two tests T and T with associated
∞
∝= 𝑃{𝑋 ⩾ 1/𝜃 = 2} = 2 ∫ 𝑒 −2𝑥 𝑑𝑥 = 0.135
1
1
𝛽 = 𝑃{𝑋 ⩾ 1/𝜃 = 1} = ∫ 𝑒 −𝑥 𝑑𝑥 = 0.635
0
7
∝= 𝑃{𝑋 ⩾ 0.7/𝜃 = 2} = 2 ∫ 𝑒 −2𝑥 𝑑𝑥 = 0.135
0
∞
𝛽 = 𝑃{𝑋 ⩾ 0.7/𝜃 = 1} = ∫ 𝑒 −2𝑥 𝑑𝑥 = 0.932
7
Example A Two –faced coin is tossed six times for which the probability of getting head in a toss is 𝜃
and the probability of getting a tail is (1−𝜃) . it is required to test the hypothesis.
𝐻0 : 𝜃 = 𝜃𝑜 = 1⁄2
43
Against 𝐻1 : 𝜃 = 𝜃𝑜 = 2⁄3
If the test consists in rejecting 𝐻0 when head appeases nurse then tours times and accepting 𝐻0
∝= 𝑃{= 𝜃𝑜 } = 7⁄26
8
𝛽 = 1 − 𝑃{𝑅𝑒𝑗𝐻0 /𝜃 = 𝜃1 } = 1 − 2 ⁄36
1 −𝑥
𝑓(𝑥, 𝜃) = 𝑒 𝜃, 𝑥 ⩾ 𝜃
𝜃
It is required to test 𝐻𝑂 : 𝜃 = 1
𝐻1 : 𝜃 = 4
Find ∝ and 𝛽 for the test having region 𝐶 = {𝑋 > 3}on the basic of a sample observation
= 𝑃{𝑋 > 3/ 𝜃 = 1}
∞
= ∫ 𝑒 −𝑥 𝑑𝑥 = 3𝑒
3
𝛽 = 𝑃{𝑠𝑐𝑒. 𝐻0 /𝜃 = 𝜃0
1 ∞
=1− 4 ∫3 𝑒 −𝑥/4 𝑑𝑥
= 1 − 𝑒 −3/4
Power = 1 − 𝛽 = 𝑒 −3/4
1
𝑓(𝑥, 𝑜 ) = ,𝑜 ⪕ 𝑥 ⪕ 𝜃
𝜃
𝐻0 : 𝜃 = 1
Against 𝐻1 : 𝜃 = 2
Suppose one observation is taken and the tests having the critical regions (a) 𝐶1 = {𝑥 ⩾ .7} and
∝= 𝑃{𝑅𝑎𝑗𝐻𝑂 /𝜃=𝜃𝑂 }
𝑃[𝑋 ⩾ .7/𝜃=1]
1
= ∫ 𝑖 𝑑𝑥 = .3
7
𝛽 = 𝑃{sec 𝐻0 /𝜃=𝜃𝑖 }
.7
1
=∫ 𝑑𝑥
0 2
= 35
(b) 𝐶2 = {. 8 ⪕ 𝑥 ⪕ 1.3}
∝= 𝑃{. 8 ⪕ 𝑥 ⪕ 1.3/𝜃 = 1}
1
= ∫ 1. 𝑑𝑥 = 2
.8
1.3
1
=∫ 𝑑𝑥 = .25
.8 2
Or 𝛽 = .75
10
𝑓 (𝑥, þ) = ( ) þ𝑥 (1 − þ)10−𝑥 , 𝑥 = 0,1, … .10
𝑥
One observation x is taken for testing 𝐻0 ∶ þ = 1⁄2 against 𝐻1 : þ = 1⁄4. Find ∝ 𝑎𝑛𝑑 𝛽 for the test
3 𝑥
10 1 1 10−𝑥
=∑( )( ) ( )
𝑥 2 2
𝑥=0
11
=
64
𝛽 = 1 − 𝑃{𝑥 ≤ 3 þ=1⁄4}
3 𝑥
10 1 3 10−𝑥
1 − ∑ ( )( ) ( )
𝑥 4 4
𝑥=0
38
= 1 − 31.
49
Example Let x have a Poison distribution 𝑃(𝜆) and it is required to test the hypothesis 𝐻𝑂 : 𝜆 = 1vs
𝐻𝑖 : 𝜆 = 2. One observation is taken and a test is considered which reject 𝐻𝑂 when X⩾3 . Find ∝, 𝛽
𝑒 −1
= 1 − ∑2𝑋=0 𝑥!
1 1 1 5
= 1−[ + + ] = 1−
𝑒 𝑒 2𝑒 2𝑒
𝛽 = 𝑃 (𝑋 ≥ 3/𝜆 = 2}
2
𝑒 −2 2𝑥
=∑
𝑥!
𝑥=0
1
= [1 + 2 + 2]
𝑒2
5
=
𝑒2
Now we are in a positions to power a the over which helps us to obtain MP tests of a sample
hypothesis against a simple alternative. In some special situations, this also gives a UMP test when
Let us suppose that we are testing a simple hypothesis against a simple alternative
46
𝐻𝑜 : 𝜃 = 𝜃𝑜
Us 𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃𝑜 )
𝐿(𝜃𝒿 ) = 𝐿(𝜃𝒿 , 𝑋1 , … , 𝑋𝑛 )
= ∏ 𝑓(𝑋𝑖 , 𝜃𝒿 ), 𝒿 = 0,1
𝐿=1
𝐿(𝜃𝐼 )
𝑊 = {𝐸/ ⩾ 𝑒}
𝐿(𝜃0 )
𝑃{𝐸 𝜖𝑊/𝜃0}=∝
So that the size and power of any test T with Cr. Regain W are follows:
Consider the test T (having cr. Region w) and other test T (having is Region since both W are of
size ∝ we have w)
∫ 𝐿𝑜 𝑑𝑥 =∝= ∫ 𝐿𝑜 𝑑𝑥 − (1)
𝑤 𝑤
47
W W1 W2 W3
𝑊1 = 𝑊 − 𝑊 ∩ 𝑊
Let 𝑊2 = 𝑊 ∩ 𝑊
𝑊3 = 𝑊 − 𝑊
∫ 𝐿𝑂 𝑑𝑥 = ∫ 𝐿𝑂 𝑑𝑥 − ∫ 𝐿𝑜 𝑑𝑥
𝑤1 𝑊 𝑊2
= ∫ 𝐿𝑂 𝑑𝑥 − ∫ 𝐿𝑂 𝑑𝑥 = ∫ 𝐿0 𝑑𝑥 − (𝑖𝑖)
𝑤 𝑤2 𝑊3
∫ 𝐿𝑖 𝑑𝑥 ⩾ 𝑐 ∫ 𝑙𝑜 𝑑𝑥 − (𝑖𝑖)
𝑤𝑖 𝑤𝑖
∫ 𝐿𝑖 𝑑𝑥 ⩾ ∫ 𝑙𝑖 𝑑𝑥 − (𝑖𝑣)
𝑤1 𝑤3
∫ 𝐿 𝑖 𝑑𝑥 ⩾ ∫ 𝐿𝑂 𝑑𝑥
𝑤1 ∪𝑤2 𝑤3 ∪𝑤2
Or ∫𝑤 𝐿 𝑖 𝑑𝑥 ∫𝑤 𝐿𝑂 𝑑𝑥
Or 𝑃𝑟 (𝑅𝑒𝑗𝐻𝑜 /𝜃=𝜃𝑖 )
Or 𝑃𝑟 (𝜃𝑖 ) ⩾ 𝑃𝑟 (𝜃𝑖 )
Which shows that T is more powerful then T any other test of size ∝ Hence T is the MP test
Remarks (1) The constant C for the MP test is determined by using the size condition
∫ 𝐿𝑜 𝑑𝑥 =∝
𝑤
(2) When X is a discrete 𝑟. 𝑣. the constant C may not be unique. What is more important is that we
may not be able to find a MP critical region with expect size ∝. To get rid of the difficultly the cr.
𝐿(𝜃1 )
𝑅𝑒𝑗 𝐻𝑂 𝑖𝑓 >𝑐
𝐿(𝜃2 )
𝐿(𝜃1 )
𝑅𝑒𝑗 𝐻𝑂 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦 𝑟 𝑖𝑓 =𝑐
𝐿(𝜃2 )
𝐿(𝜃1 )
𝐴𝑐𝑐 𝐻𝑂 𝑖𝑓 >𝑐
{ 𝐿(𝜃2 )
𝐿(𝜃1 ) 𝐿(𝜃1)
Then the size of test is 𝑃𝑜 { > 𝑐} + 𝑟𝑃𝑜 { = 𝑐} =∝
𝐿(𝜃𝑜 ) 𝐿(𝜃𝑜 )
To any given ∝, 𝑟 can be determined. Such a test is called the a randomized test
Let us test 𝐻𝑂 𝜃 = 6 us 𝐻1 : 𝜃 = 𝜃1 (> .6). The MP test has cr. Region {∑51 𝑥𝑖 ⩾ 𝑐}
From the tables of Bernoulli Distribution we can to tabulate 𝑃𝑜 {∑51 𝑥𝑖 ⩾ 𝑐/ 𝜃 = .6} us follows
C P(∑51 𝑥𝑖 ⩾ 𝑐) PO
1 0.01024 1.00000
2 0.23040 0.98976
3 0.34560 0.68256
4 0.25420 0.33696
5 0.07776 0.07776
As such, no non-randomized MP test of exact size∝ .05 or 01 exists. However, the randomized MP
𝑅𝑎𝑗 𝐻𝑂 𝑖𝑓 ∑ 𝑥𝑖 > 3
1=𝑖
5
. 01304
𝑅𝑎𝑗 𝐻𝑂 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑖𝑓 ∑ 𝑥𝑖 = 3
. 34560
1=𝑖
5
𝐴𝑐𝑒 𝐻𝑂 𝑖𝑓 ∑ 𝑥𝑖 = 3
{ 1=𝑖
(3) Suppose we test the simple hypothesis 𝐻𝑂 : 𝜃 > 𝜃𝑜 against a composite alternation 𝐻𝑖 : 𝜃 ≠ 𝜃𝑜 or
𝐻𝑖 : 𝜃 > 𝜃𝑜 or 𝐻𝑖 : 𝜃 < 𝜃𝑜 if the MP test for 𝐻𝑂 : 𝜃 = 𝜃𝑜 a gains 𝐻𝑖 : 𝜃 = 𝜃𝑖 given by the NP lemma dose
not depend on 𝜃𝑖 ,the same test with be MP for all alternative values of 𝜃 and, therefore it will be
a.UMP test.
𝑒 −𝜆 𝜆𝑥
𝑓(𝑥, 𝜆) = , 𝑥 = 𝑜, 1,2
𝑥𝑖
We want to test 𝐻𝑖 ∶ 𝜆 = 𝜆0
Against 𝐻𝑖 ∶ 𝜆 = 𝜆1
We have
𝑛 𝑛
−𝑛𝜆 ∑𝑛
𝑖 𝑥𝑖
𝐿 (𝑜 ) = ∏ 𝑓(𝑥, 𝜆) = 𝑒 𝜆 / ∏ 𝑥𝑖
𝑖 𝑖
𝐿(𝜆 )
𝑊 = {𝐿(𝜆1) ⩾ 𝐶} , 𝑖, 𝑒. inside W we have
0
𝑛
𝐿(𝜆1 ) 𝜆1
= 𝑒 −𝑛(𝜆1 −𝜆0 ) ( ) ∑ 𝑥𝑖 ⩾𝐶
𝐿(𝜆0 ) 𝜆0
𝑖
𝜆
Or – 𝑛(𝜆1 − 𝜆0 ) + (∑ 𝑥𝑖 )𝑙𝑜𝑔 ( 1 ) ⩾ 𝐶
𝜆0
Or ∑𝑛1 𝑥𝑖 ⩾ 𝑘
𝑐+𝑛(𝜆1−𝜆0 )
Where 𝓀= log (𝜆1 /𝜆0 )
𝑤 = {∑ 𝑥𝑖 ⩾ 𝓀}
𝑖
50
We know that ∑𝑛𝑖 𝑥𝑖 has Poisson distribution 𝑃(𝑛𝜆) so that k can be determined by solving
𝑃(∑𝑛𝑖 𝑥𝑖 ⩾ 𝓀 /λ=𝜆0 ) =∝
(ii) Sine the cr region does not depend on the value of 𝜆1 there are UMP for the alternative 𝐻𝑖 : 𝜆 >
(iii)For getting a MP test for an exact size ∝ we may have to use randomized test
𝑓 (𝑥, 𝑜 ) = 𝜃𝑒 −𝜃𝑥 (𝑥 ⩾ 𝑜)
We want to test 𝐻𝑜 : 𝜃 = 𝜃𝑜
Us 𝐻𝑖 : 𝜃 = 𝜃𝑖 (< 𝜃𝑜 )
𝑛−𝜃 ∑𝑛
𝑖 (𝑥𝑖 −𝜇)
2
We have 𝐿(𝜃) = ∏𝑛𝑖 𝑓 (𝑥, 𝑜 ) = 𝜃𝑒
𝐿(𝜃 )
𝑊 = {𝐿(𝜃𝑖 ) ⩾ 𝑐}
0
1
−
𝐿(𝜇𝑖 ) 𝑒 2𝜎2 ∑𝑛
𝑖 (𝑥𝑖 −𝜇𝑖 )
2
𝑖, 𝑒 Inside W 𝐿(𝜇𝑂 )
= −
1 ⩾𝑐
𝑒 2𝜎2 ∑𝑛
𝑖 (𝑥𝑖 −𝜇𝑜 )
2
1
−
Or 𝑒 2𝜎2 [∑𝑛𝑖(𝑥𝑖 − 𝜇𝑖 )2 − ∑𝑛𝑖(𝑥𝑖 − 𝜇𝑜 )2 ] ⩾ 𝑐
Or 𝑥̅ ⩾ 𝓀
Whose 𝓀 = 𝑟, ℎ, 𝑠
GRAPH HERE
𝑃[𝑍 ⩾ 𝓀∝ ] =∝
𝓀 by solving
𝑃𝜇𝑜 {𝑥̅ ⩾ 𝓀} =∝
𝑥̅ ⩾𝜇 𝓀−𝜇𝑜
Or 𝑃𝜇𝑜 { 𝜎√𝑛𝑜 ⩾ 𝜎√𝑛
} =∝
𝓀−𝜇𝑜
Or 𝑇𝜇𝑜 {𝑧 ⩾ 𝜎√𝑛
} =∝
Under 𝐻𝑜 , 𝑧 has N (o,1)and the tables of standard normal distribution provider the value of
𝓀−𝜇𝑜 𝜎
𝓀∝ such that 𝓀∝ = or 𝓀 + 𝜇𝑜 + 𝓀∝
𝜎√𝑛 √𝑛
𝑃𝜇𝑖 {𝑥̅ ⩾ 𝓀}
𝑥̅ − 𝜇𝑖 𝓀 − 𝜇𝑖
𝑃𝜇𝑖 { ⩾ }
𝜎√𝑛 𝜎√𝑛
√𝑛(𝜇𝑜 − 𝜇𝑖 )
𝑃𝜇𝑖 {𝑧 ⩾ + 𝓀∝ }
𝜎
Since (𝜇𝑜 − 𝜇𝑖 ) < 𝑜 ,it shows that the power is an impressing function of n
𝜎
(ii) If 𝜇𝑖 < 𝜇𝑜 the MP test can be shown to have the critical region {𝑥̅ ⩾ 𝓀} where 𝓀 = 𝜇𝑜 + 𝓀∝
√𝑛
(iii) We observe that the MP test of 𝐻𝑜 : 𝜇 = 𝜇𝑜 us 𝐻𝐼 : 𝜇 = 𝜇𝑖 (> 𝜇𝑜 ) has a cr region which dose not
depend on 𝜇𝑖 the same test will be UMP for testing 𝐻𝑜 : 𝜇 = 𝜇𝑜 against 𝐻𝐼 : 𝜇 > 𝜇𝑜 Similarly the MP
However it can be shown that there is no test which is UMP for𝐻𝑜 : 𝜇 = 𝜇𝑜 against𝐻𝐼 : 𝜇 ≠ 𝜇𝑜
We want to test
𝐻𝑜 ∶ 𝜎 = 𝜎𝑜
Us 𝐻𝑖 ∶ 𝜎 = 𝜎𝑖 (> 𝜎𝑜 )
We have
1 1
− 2 ∑𝑛 (𝑥 −𝜇)2
𝐿(𝜎) = 𝑒 2𝜎 𝑖 𝑖
(2𝜋)𝑛/2 𝜎 𝑛
𝐿(𝜎 )
Therefore the MP test has the cr region w depend by 𝑊 = {𝐿(𝜎 𝑖 ) ⩾ 𝑐}
𝑜
𝑖, 𝑒 inside W
𝐿(𝜎𝑖 ) 𝜎𝑜 𝑛 𝑛 2 1 1
= ( ) 𝑒 − ∑𝑖 (𝑥𝑖−𝜇) ( 𝜎2 − 𝜎2 ) ⩾ 𝑐
𝐿(𝜎𝑜 ) 𝜎𝑖 21 20
1 1 𝜎 𝑛
Or ∑𝑛𝑖(𝑥𝑖 − 𝜇)2 ( − 2𝜎2 ) ⩾ log 𝑐 (𝜎 𝑖 )
2𝜎2
1 0 𝑜
𝜎
2{log 𝑒+𝑛𝑙𝑜𝑔( 𝑖 )]
𝜎𝑜
Where 𝓀= 1 1
( 2 − 2)
𝜎𝑜 𝜎1
(𝑥𝑖−𝜇)2
Since ∑𝑛𝑖 𝜎2
~𝑥𝑛2 we can determine 𝓀by solving
(𝑥𝑖−𝜇)2 𝓀
Or 𝑃𝜎𝑜 {∑𝑛𝑖 𝜎2
⩾ 𝜎 2} =∝
0
𝓀
Or 𝑃𝜎𝑜 {𝛾 ⩾ } =∝
𝜎02
Where 𝑌~𝑥𝑛2
From the table of 𝑥𝑛2 we can find 𝓀∝ such that 𝑃{𝛾 ⩾ 𝓀∝ } =∝ so that 𝓀 = 𝜎02 𝓀∝
∑(𝑥𝑖 − 𝜇)2 𝓀
= 𝑃𝜎1 { 2 ⩾ 2}
𝜎1 𝜎1
𝜎02
= 𝑃𝜎1 {𝑌 ⩾ 𝓀 }
𝜎12 ∝
Where 𝑌~𝑥𝑛2
(ii)If 𝜎1 < 𝜎0the MP test can be shown to have the cr region {∑𝑛𝑖(𝑥𝑖 − 𝜇)2 ≤ 𝓀′}
(iii)Since the MP test of 𝐻0 : 𝜎 = 𝜎𝑜 us 𝐻𝑖 : 𝜎 = 𝜎1 (> 𝜎𝑜 ) dose not depend on 𝜎𝑖 it is UMP for testing
𝐻0 : 𝜎 = 𝜎𝑜 against 𝐻𝑖 : 𝜎 > 𝜎𝑜 Similarity the MP test for 𝐻0 : 𝜎 = 𝜎𝑜 against 𝐻𝑖 : 𝜎 > 𝜎1 (> 𝜎𝑜 )is UMP
𝑓(𝑥, 𝑜 ) = 𝜃𝑥 𝜃−1 (0 ≤ 𝑥 ≤ 1)
We want to test
𝐻0 ∶ 𝜃 = 𝜃0
Against 𝐻𝑖 ∶ 𝜃 = 𝜃1 (> 𝜃0 )
𝐿(𝜃 )
Therefore, the MP has the cr region W={𝐿(𝜃𝑖 ) ⩾ 𝐶} 𝑖, 𝑒 𝑖𝑛𝑠𝑖𝑑𝑒 𝑊
0
𝑛
𝜃𝑖 𝑛
( ) [∏ 𝑥𝑖 ]𝜃𝑖 −𝜃𝑜 ⩾ 𝑐
𝜃0
𝑖=1
𝜃
Or∏𝑛𝑖=1 𝑥𝑖 ⩾ 𝓀where 𝓀 = [𝑐 ( 𝜃𝑜 ) 𝑛] 1
/𝜃𝑖 − 𝜃0 )
𝑖
{∏ 𝑥𝑖 ⩾ 𝓀}
𝑖=𝑖
54
determined by solving
2
Where 𝛾~𝑥2𝑛
Remark In the same manner for 𝐻𝑜 ∶ 𝜃 = 𝜃𝑂 against 𝐻𝑖 : 𝜃 = 𝜃1 (< 𝜃𝑂 ) MP test can be found.
𝑓, (𝑥)
{𝑥 ⩾ 𝑐}
𝑓𝑜 , (𝑥)
2
2 𝑒 𝑥 /2
Or √ ⩾𝑎
𝑥 1+𝑥 2
𝑃𝐻𝑂 {|𝑥| ⩾ 𝓀} =∝
(7) Suppose X has the following distribution under 𝐻𝑂 and 𝐻𝑖 will here the critical region
𝜋 𝑥²⁄
{ x : √ 2 𝑒 |𝑥|+ 2 ≥𝐶}
f1(x)
Since f0(x) is a non-decreasing function of |𝑥|, the critical region is { |𝑥| ≥ k} where k= 𝑧𝛼⁄2
1 𝑥²⁄
H0 : f0(x) = 𝑒 2 ; -∞< x<+∞
√2𝜋
2 4
H1 : f1(x) = 1 𝑒 −𝑋 ; -∞< x<+∞
Г
4
Let us take a single observation. The MP test of H 0 Vs H1 has the critical region
f1(x)
{ x: ≥C}
f0(x)
4 +𝑥²⁄
Or 𝑒 −𝑥 2 ≥C’
Since L.H.S. is a non-increasing function of |𝑥|, the critical region is {|𝑥| ≤ k} where k = 𝑧(1−𝛼)⁄
2
Let us take a single observation. The MP test of H0 VS H1has the critical region given by
f1(x)
f0(x)
≥C
1
f1(x) 4𝑥
; 0 < 𝑥 < 1/2
Where ={ 1
f0(x) ; 1/2 ≤ 𝑥 < 1
4(1−𝑥)
f1(x)
We see that ≥C
f0(x)
Hence MP or region is
We want to test
H0: ϴ = ϴ0 Vs
H1:ϴ = ϴ1(>ϴ0)
We have
1
L(ϴ) = 𝛳𝑛 I[0,X(n)](X(1))I[0,ϴ](x(n))
𝐿(ϴ0)
Therefore the MP test has the critical region W ={ 𝐿(ϴ1) ≥C}
Now,
𝐿(ϴ0)
This shows that 𝐿(ϴ1) is an increasing function of 𝑥(𝑛) and, therefore
𝐿(ϴ0)
𝐿(ϴ1)
≥C 𝑥(𝑛) ≥k
{ 𝑥(𝑛) ≥k}
P { 𝑥(𝑛) ≥k/𝛳0 } = α
𝑛𝑦 𝑛−1
Since 𝑥(𝑛) has p.d.f. f𝑥(𝑛) (Y)= 𝛳𝑛
; 0≤y≤ϴ
𝑛 𝛳0 𝑛−1
We have ∫ 𝑦
𝛳𝑛 𝑘
𝑑𝑦 =α
Remark: the above test is UMP for H0: ϴ=ϴ0 against H1:ϴ>ϴ0
56
As we have remarked, UMP test may not always exist. Therefore we for their restrict the class of
tests by considering unbiased tests (defined below) and then try to obtain UMP test in the class of
unbiased tests. If such a test exists we call it uniformly not powerful unbiased test (UMPU test)
Definition Suppose we are testing a sample hypothesis Hϴ: 𝜃 = 𝜃0 against a conqurite alternative
Remark: Suppose 𝜃 = 𝜃1 is one of the alternative value of 𝜃. If the test is not unbiased it may
happen that 𝑃𝑜 (𝑇) <∝= 𝑃00 (𝑇) which means that the probability of rejecting 𝐻𝑜 when it is false is
less then the probability if rejecting 𝐻𝑜 when it is true if the test is unbiased it will not happen.
Prof Let T be a MP (or UMP) test of size ∝. Consider another test T which rejects the null hypothesis
HO: 𝜃 = 𝜃0 with probability ∝ irrespective of the sample outcome. We may just toss a coin for
which the probability of is ∝ and decide to reject the null hypothesis Hϴ if we get ∝ , irrespective if
the sample values obtained. Then
So that the size of the test T=∝. Also the power of test T is also∝, since
Or 𝑃𝑇 (𝜃) ⩾∝ for 𝜃 ≠ 𝜃0
Remark: It may be shown that the following tests are UMPU for two sided alternative 𝐻𝑖 ∶ 𝜃 ≠ 𝜃0 in
example 1,2 and 3
Now we consider a produce for constructing tests that has some intuitive appeal and that .
Frequently, though not always, leads to UMP or UMPU test. Also the produce leads to test that have
decided large sample properties
Suppose we are given a sample (𝑥1 , … , 𝑥𝑛 ) from a distribution with þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃 ) (where 𝜃 may be
a vector) and we deice to test the null hypothesis 𝐻𝑜 ∶ 𝜃 𝜖 𝑤(⊂ 𝛺) against the alternative
hypothesis 𝐻𝑖 ∶ 𝜃 𝜖 𝑤(⊂ 𝛺) where 𝛺 is the parameter space,
max 𝐿(𝜃)
𝜆 = 𝜃 𝜀𝜔
max𝐿(𝜃)
𝜃 𝜀𝛺
max 𝐿(𝜃)
Where denotes the maximum of the likelihood function when 𝜃 is restricted to values in
𝜃 𝜀𝜔
w and max 𝐿(𝜃) denotes the maximum of the likelihood for when 𝜃 takes all possible values in𝛺
Definition The likelihood ratio test of 𝐻𝑜 against 𝐻𝑖 has the critical region
𝑤 = {𝜆 ⪕ 𝜆𝑜 }
Sup
𝑃{𝜆 ⪕ 𝜆𝑜 /𝜃𝜖𝐻𝑜 } =∝
𝜃𝜖𝐻𝑜
Remark (1) For testing a simple hypothesis against a simple alternative likelihood ratio test is
equivalent to the test given by the Neyman –Pearson lemma.
(ii) if a sufficient statistics exists the L.R test is a function of the sufficient statistics.
Example: (1) Let X be a r.v. having a normal distribution 𝑁(𝜇, 𝜎) where 𝜎 (=𝜎𝑜 ) is known
We want to test 𝐻𝑂 : 𝜇 = 𝜇𝑜
Against 𝐻1 : 𝜇 ≠ 𝜇𝑜
Then
1 𝑛 2 /2𝜎 2
max 𝐿(𝜇) = 𝑛 𝑒 − ∑𝑖 (𝑥𝑖−𝜇0 ) 0
𝐻0 ( 𝜎0√2𝜋)
2
∑𝑛 2
𝑒 − 𝑖 (𝑥𝑖−𝜇0) /2𝜎0
Or 2 ≤ 𝜆0
− ∑𝑛 ̅) /2𝜎2
𝑖 (𝑥𝑖 −𝑥 0
𝑒
1 2 −∑(𝑥
2 [∑(𝑥𝑖 −𝑥̅ ) 𝑖 −𝜇0 )²
𝑒 2𝜎0 ≤ 𝜆0
–𝑛(𝑥̅ −𝜇0 )²
Or 2𝜎02
≤ log 𝜆0
𝑛(𝑥̅ −𝜇0 )²
or 𝜎02
≥𝑘
|(𝑥̅ −𝜇0 )|
or 𝜎0 ≥k’
⁄
√𝑛
58
Remark (i) the above test is not UMP test since there exists other UMP tests for 𝐻1 : 𝜇 > 𝜇0 and
√𝑛(𝑥̅ −𝜇𝑜)
𝐻𝑖 : 𝜇 < 𝜇0 (II) 𝜎𝑜
~𝑁(0, 𝐼) under 𝐻𝑂 so that 𝑘 can 𝑘 found easily by using size condition
(2) Let x ~𝑁(𝑂, 𝐼) where both 𝜇 and 𝜎 are unknown we want to test
𝐻𝑂 : 𝜇 = 𝜇𝑜
Against 𝐻𝑖 : 𝜇 ≠ 𝜇0
1 −
1 𝑛
∑ (𝑋 −𝜇)2
𝐿(𝜇, 𝜎) = 𝑒 2𝜎 2 𝑖 𝐼
(𝜎√2𝜋)
𝑛
(𝑥𝑖 − 𝜇𝑜)2
𝜎̂ == √∑
𝑛
𝑖
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝜎̂ = 𝑠0 = √∑
𝑛
𝑖
𝑖 2
Therefore, we have max 𝐿(𝜇, 𝜎) = 𝑒 − ∑(𝑥𝑖−𝜇𝑜) /2&2𝑜
(&𝑜 √2𝜋)𝑛
𝑖 𝑛
= 𝑒−2
(&𝑜 √2𝜋)𝑛
max 𝐿(𝜇, 𝜎) 𝐼 2
And = 𝑒 − ∑(𝑥𝑖−𝑥̅ ) /2&2
𝜇, 𝜎 (&√2𝜋)𝑛
𝐼 𝑛⁄
= 𝑒− 2
(&√2𝜋)𝑛
max 𝐿(𝜇, 𝜎)
𝐻𝑜
𝜆= ⪕ 𝜆𝑂
max 𝐿(𝜇, 𝜎)
𝐻𝑜
& 𝑛
Or ( )
&𝑜
&2
Or &2𝑜
⪕ 𝜆′𝑂
𝑛&2𝑜
Or ⩾𝓀
𝑠2
𝑛(𝑥̅ − 𝜇𝑜)2
⩾ 𝓀′
𝑠2
√𝑛[(𝑥̅ −𝜇𝑜)
Or 𝑠2
⩾ 𝓀’’
∑(𝑥𝑖−𝑥̅ )2 𝑛𝑠 2
Where 𝑠= 𝑛−𝑖
= 𝑛−𝑖 ’
59
√𝑛(𝑥̅ −𝜇𝑜 )
It is know that &
has t distribution on (𝑛 − 1)𝑑. 𝑓 under 𝐻𝑂 There fore the values of 𝓀 can be
found from the size condition
𝑃{|𝑌| ⩾ 𝓀} =∝
Where Y~𝑡𝑛−𝑖
(3) Let X ~N(𝜇, 𝜎) when both 𝜇 and 𝜎 are unknown we want to test
𝐻𝑂 : 𝜎 = 𝜎𝑂
Against
𝐻𝑖 : 𝜎 ≠ 𝜎0
1 1
− 2 ∑𝑛 (𝑥 −𝜇)2
𝐿(𝜇, 𝜎) = 𝑒 2𝜎 𝑖 𝑖
(𝜎√2𝜋)𝑛
Under𝐻𝑂 , the 𝑚, 𝑙, 𝑒 of 𝜇 is 𝜇̂ = 𝑥̅
In general, 𝑚, 𝑙, 𝑒 of 𝜇 is 𝜇̂ = 𝑥̅ and 𝑚, 𝑙, 𝑒 of 𝜎 is
∑(𝑥𝑖 − 𝑥̅ )2
𝜎̂ = 𝑠 = √
𝑛
Then we have
𝑚𝑎𝑥𝐿(𝜇, 𝜎) 1 − ∑(𝑥𝑖−𝑥̅ )2
= 𝑛𝑒 /2𝜎𝑜2
𝐻𝑜 (𝜎𝑜√2𝜋)
𝑛 𝑠2
1 −
2 𝜎𝑠2
= 𝑛 𝑒
(𝜎𝑜 √2𝜋)
1 𝑛
= 𝑛 𝑒−2
(&√2𝜋)
𝑚𝑎𝑥𝐿(𝜇,𝜎)
𝐻𝑜
L.R test cr region is given by 𝜆= 𝑚𝑎𝑥𝐿(𝜇,𝜎) <𝜆𝑜
𝜇,𝜎
𝑛 𝑠𝑜
𝑠𝑜 ( −𝐼)
2 𝜎2
Or ( 2) 𝑜 < 𝜆𝑜
𝜎𝑜
𝑛 𝑛
𝑠2
Or 𝓎 2 𝑒 − 2 (𝓎−𝑖) < 𝜆𝑜 𝑤ℎ𝑒𝑟𝑒 𝓎 = 𝜎 2
𝑜
𝑛 𝑛
We note that 𝓎 2 𝑒 −2 (𝓎−𝑖) has a maximum at 𝓎 = 1
𝑠2 𝑠2
{ ⪕ 𝓀 2 𝑜𝑟 ⪕ 𝓀1 }
𝜎𝑜2 𝜎𝑜2
(𝑛)𝑠 2 (𝑛)𝑠 2
{ ⩾ 𝓀 2 𝑜𝑟 ⪕ 𝓀1 }
𝜎𝑜2 𝜎𝑜2
(𝑛)𝑠 2 ∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
2
But it is know that 𝜎𝑜2
= 𝜎𝑜2
has 𝑥 2 distribution on (n-i) 𝑑, 𝑓 using the 𝑥𝑛−𝑖 tables and size
condition we can get the values of 𝓀1 and 𝓀2
60
(3a) suppose in example 3 the value of 𝜇(= 𝜇𝑜 )is know. Then the L.R cr region because
𝑛𝑠𝑜2 𝑛𝑠𝑜2
{ ⩾ 𝑐1 𝑜𝑟 ⩾ 𝑐2 }
𝜎𝑜2 𝜎𝑜2
𝑛𝑠𝑜2 ∑𝑛
𝑖 (𝑥𝑖 −𝑥̅ )
2
In than case 𝜎𝑜2
= 𝜎2
has𝑥𝑛2
1 −𝑥
𝑓 (𝑥, 𝑜 ) = 𝑒 𝜃 (𝑥 ⩾ 𝜃)
𝜃
We want to test 𝐻𝑜 ∶ 𝜃 = 𝜃0
Against 𝐻𝑖 ∶ 𝜃 = 𝜃𝑜
1 − 1 ∑ 𝑥𝑖
𝐿 (𝜃) = 𝑒 𝜃
𝜃𝑛
1 −𝑛𝑥̅
= 𝑒 𝜃
𝜃𝑛
Then we get
𝑛𝑥̅
1 − 𝑓𝑜𝑟𝑥̅ >𝜃𝑜
𝑛 𝑒 𝜃𝑜
𝑚𝑎𝑥𝐿(𝜃) (𝜃𝑜 )
=
𝐻𝑜 𝑖 −𝑛 𝑓𝑜𝑟𝑥̅ ⪕𝜃𝑜
𝑛𝑒
{ )
(𝑥̅
𝑚𝑎𝑥𝐿(𝜃) 𝑖
Also = (𝑥̅ )𝑛 𝑒 −𝑛
𝜃
Because 𝑚, 𝑙, 𝑒 of 𝜃 is 𝜃̂ = 𝑥̅
Where
𝑖 𝑖
∑ 𝑥𝑖𝑥̅ >𝜃𝑜
𝑒 −𝜃
(𝜃𝑜 )𝑛
𝑥=
𝑖
𝑒 −𝑛 𝑓𝑜𝑟 𝑥̅ ⪕𝜃𝑜
{ (𝑥̅ )𝑛
𝑥̅
Since 𝓎𝑛 𝑒 −𝑛(𝓎−𝑖) at lains maximum at 𝓎 − 𝑖 taking 𝓎 = 𝜃 we see that λ=i if 𝓎 = 𝑖 and λ⪕𝜆𝑜 for
𝑜
𝓎 ⩾ 𝓀(𝑜 < 𝓀 < 𝑖)
𝑥̅
{ ⩾ 𝓀} 𝑜𝑟{𝑥̅ ⩾ 𝓀}
𝜃𝑜
Remark (i) if one take 𝐻𝑖 ∶ 𝜃 = 𝜃𝑜 we shall get the L.R critical region as{𝑥̅ ⩾ 𝓀} in both case of one
–sided alternation the L.R test are UMP test.
(2) Since ∑𝑛𝑖 𝑥𝑖 has gamma distribution we can find the value of 𝓀 by using size condition
(5) Let (𝑥𝑖 , . . 𝑥𝑛 ) be 𝑎, 𝑟, 𝑠 from 𝑁(𝜇, 𝜎𝑖 ) and (𝛾𝑖 , 𝛾𝑛2 ) be 𝑎, 𝑟, 𝑠 from another 𝑁(𝜇2 , , 𝜎2 ) where two
samples (distribution) are independent.
61
We want to test
𝐻𝑜 : 𝜇1 = 𝜇2
}
𝐻𝑖 : 𝜇1 ≠ 𝜇2
𝑛 𝑠 +𝑛 𝑠 2 2
And 𝜎̂𝑜 = 1𝑛1 +𝑛2 2 = 𝑠 2 (𝑠𝑎𝑦)
1 2
𝐼 𝐼
Also 𝑠12 = 𝑛 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 and𝑠22 = 𝑛 (𝓎𝑖 − 𝓎̅)2
1 1
Therefore
𝑛1 𝑥̅ + 𝑛2 𝓎̅
𝜇
̂1 = 𝜇
̂2 = = 𝑚(𝑠𝑎𝑦)
𝑛1 + 𝑛2
1
And 𝜎̂2 = [∑𝑛𝑖(𝑥𝑖 − 𝑚)2 + ∑𝑛𝑖(𝑥𝑖 − 𝑚)2
𝑛1 +𝑛2
𝑛1 𝑛2
1
= [∑ {(𝑥𝑖 − 𝑥̅ ) + (𝑥̅ − 𝑚 )}2 + ∑{(𝓎𝑖 − 𝓎̅ ) + (𝓎̅ − 𝑚)}2 ]
𝑛1 + 𝑛2
𝑖 𝑖
𝑛1 𝑛2
1
= [∑(𝑥𝑖 − 𝑥̅ )2 + 𝑛1 (𝑥̅ − 𝑚)2 + ∑(𝓎𝑗 − 𝓎̅)2 + 𝑛2 (𝓎̅ − 𝑚)}2 ]
𝑛1 + 𝑛2
𝑖 𝑖
𝑛1 𝑛2
1 𝑛1 𝑛2
= [∑(𝑥𝑖 − 𝑥̅ )2 + ∑(𝓎𝑖 − 𝓎̅)2 + (𝑥̅ − 𝓎̅)2 ]
𝑛1 + 𝑛2 𝑛1 𝑛2
𝑖 𝑖
𝑛1 𝑛2
= 𝑠𝑜 + (𝑥̅ − 𝓎̅)2 = 𝑠𝑜2 (𝑠𝑎𝑦)
(𝑛1 + 𝑛2
𝑚𝑎𝑥𝐿(𝜇1 , 𝜇2 , 𝜎) 𝐼 𝑛1 +𝑛2
Therefore = (√2𝜋)𝑛1+𝑛2 (𝑠2)𝑛1 +𝑛2 𝑒 − 2
𝐻𝑜 𝑜
𝑠𝑜2 𝑛1 +𝑛2
𝜆 = ( 2) ⪕ 𝜆𝑜
𝑠𝑜
𝑠𝑜2
Or 𝑠𝑜2
⪕𝓀
(𝑥̅ −𝓎)2
Or 2 𝑖 𝑖
(𝑛1 +𝑛2 )𝑠 ( + )
𝑛1 𝑛2
(𝑥̅ −𝓎)2
Or 𝑖 𝑖
𝑠2 ( + )
𝑛1 𝑛2
[𝑥̅ −𝓎̅]
{ 𝑖 𝑖
⩾ 𝓀}
𝑠𝑜 √ +
𝑛1 𝑛2
(6)Let (𝑋𝐼 , . 𝑋𝑛𝐼 ) be 𝑎, 𝑟, 𝑠 from N (𝜇, 𝜎𝑖 ) and(𝛾1 , . 𝛾𝑛2 )𝑛 N (𝜇2 , 𝜎2 ) where two samples (and two
distributions) are indecent
We want to test
𝐻𝑂 : 𝜎1 = 𝜎2
Against }
𝐻𝑖 ∶ 𝜎1 ≠ 𝜎2
In general, be 𝑚, 𝑙, 𝑒 of 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 are
𝑛1 𝑛2
1 1
𝜇
̂1 = 𝑥̅ , 𝜇 ̂1 = ∑(𝑥𝑖 − 𝑥̅ )2 , 𝜎
̂2 = 𝓎̅, 𝜎 ̂2 = ∑(𝓎𝑖 − 𝓎̅)2
𝑛1 𝑛2
𝑖 𝑖
1 𝑛 +𝑛
− 1 2
max 𝐿(𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 ) = 𝑛 𝑒 2
(2𝜋)𝑛1 +𝑛2 (𝑠12 ) 2 (𝑠 2 )𝑛2 /2
2
𝑛1 𝑠12 + 𝑛2 𝑠22
= = 𝑠 2 (𝑠𝑎𝑦)
𝑛1 + 𝑛2
𝑛1 +𝑛2
1
So that max 𝐿(𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 ) = 𝑛1 +𝑛2 𝑒− 2
(2𝜋)𝑛1+𝑛2 (𝑠 2 ) 2
𝑛1 𝑛2
𝑠12 ) 2 (𝑠22) 2
Or 2
𝑛 𝑠 +𝑛 𝑠 2 𝑛1 +𝑛2 ⪕ 𝜆𝑜
( 1 1 2 2) 2
𝑛1 +𝑛2
𝑛1
(𝑛 −1)
[ 1 𝑓] 2
(𝑛2 −1)
Or (𝑛 −1) 𝑛 +𝑛 ⪕ 𝜆𝑜
[1+ 1 𝑓] 1 2
(𝑛2 −1) 2
𝑛 𝑠2 𝑛 𝑠2
Where 𝑓 = (𝑛 1−1)
1
/ (𝑛 1−1)
2
1 2
Setting 𝑔(𝑓)𝑓𝑜𝑟the 𝐿. 𝐻. 𝑆 of (i) we have 𝓰(o)=o and 𝓰(f)→o∞. Furthermore 𝓰(f) attains its
𝑓 𝑛 (𝑛 −1)
maximum for = 1 2 , it is impressing between o and f may and derision in (f may, ∞).
𝑚𝑎𝑦 𝑛2 (𝑛1 −1)
63
Therefore 𝓰(f)⪕ 𝜆𝑜 if and only if 𝑓 < 𝓀1 or 𝑓 > the LR cr region can be within as {𝐹 < 𝓀1 𝑜𝑟 𝐹 >
𝓀2 }
𝑛 𝑠 2 /(𝑛 −1)
Where 𝐹 = 𝑛1 𝑠12 /(𝑛1 −1)
2 2 2
Hence 𝓀1 , 𝓀2 can be obtained from the size condition 𝑃{𝑓 > 𝓀1 𝑜𝑟 𝐹 < 𝓀2 } =∝whese F~𝐹𝑛1−1,𝑛2−1
𝛽∝
𝑓 (𝑥) = Γ(∝) 𝑥 ∝−1 𝑒 −𝛽∝ ;x≥0
=0 ;𝑥 < 0
(∝> 0, 𝛽 > 0)
𝑡
We have 𝑚, 𝑔, 𝑓 𝑀𝑥 (𝑡) = (1 − 𝛽)−∝ , 𝑡 <𝛽
𝐸(𝑋) =∝/𝛽
𝑉(𝑋 ) =∝/𝛽2
𝐸(𝑋) = 1/𝛽
𝑉(𝑋 ) = 1/𝛽2
1 𝑛 𝑥
𝑓(𝑥) = 𝑛 𝑥 2 −𝑖 𝑒 −2 , 𝑥 ⩾ 𝑜
22 𝐼(𝑛⁄ 2)
𝐸(𝑥) = 𝑛
}
𝑣(𝑥) = 2𝑛
Γ (𝑛 +
2
1
) 𝑥 2 −𝑛+1
𝑓 (𝑥) = (1 + ) 2 , −∞ < 𝑥 < ∞
Γ (𝑛2) √𝑛𝑥 𝑛
𝛾
If X~𝑛(𝑜 − 𝑖 ), 𝛾~x 2 (n) and x and 𝛾 are inept then 𝑇 = 𝑋/√ ⁄𝑛 has 𝑡(𝑛)
Γ (𝑚 2+ 𝑛) 𝑚 𝑚 𝑚
2 −1
𝑓(𝑥) = ( ) 2 ,𝑥 ⩾ 0
Γ (𝑚2
𝑛
)Γ( ) 𝑛
2 (𝐼 +
𝑚 𝑚+𝑛
𝑥) 2
𝑛
=0 , x<0
64
𝑥⁄
𝑚
Of 𝑥~𝑥 2 (𝑚 ) and γ~𝑥 2 (𝑛) where 𝑥 and γ are independent𝑧 = 𝛾⁄ has 𝐹(𝑚, 𝑛)
𝑛
Percentage points the upper∝ − percent point of the 𝑥 2 (𝑛) distribution is 𝑥 2 𝑛, ∝ where
The upper∝ − percent point of the F(m ,n, ∝) distribution is Fm ,n, ∝ where
𝑃 (𝐹 (𝑚, 𝑛) > 𝐹, 𝑚. 𝑛, ∝) =∝
𝑖
Note that 𝐹𝑚, 𝑛, 𝑖−∝=
𝐹𝑛,𝑛∝
Use of 𝒙𝟐distribution (i) Testing the variance of a of a distribution: Given a sample (𝑥𝑖 , … 𝑥𝑛 ) of
size n from a normal distribution 𝑁(𝜇, 𝜎) where 𝜎 is unknown, we would like to test 𝐻𝑜 : 𝜎 = 𝜎𝑜
against alternative 𝜎>𝜎𝑜 or 𝜎<𝜎𝑜 or 𝜎 ≠ 𝜎𝑜 the tests are summarised n follows
Case I 𝜇 know
2
" " ⪕ 𝑥 2 𝑛, ∝ 𝑥𝑛−𝑖,−∝/2
Ho: 𝜎 ≠ 𝜎𝑂 2 }
𝑜𝑟 ⩾ 𝑥𝑛−𝑖,−∝/2
Case II 𝜇 know
2
" " ⪕ 𝑥 2 𝑛, ∝ 𝑥𝑛−𝑖,−∝/2
Ho: 𝜎 ≠ 𝜎𝑂 2 }
𝑜𝑟 ⩾ 𝑥𝑛−𝑖,−∝/2
1
Where (𝑠)2 = 𝑛−1 ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2
(2) Testing proportions in 𝓀(>2) classes Suppose 𝑎, 𝑟, 𝑣 takes values in one of 𝓀(>2)mutually
exclusive classes AI,......A𝓀 with þ = 𝑃 (𝑥 𝜖 AI ), 1,2, … . . 𝓀, ∑𝓀
I þI = I we want to test the hypotheses
that
𝐻𝑜 : þ𝑖 = þ𝑜𝑖 (𝑖 = 1, … . 𝓀)
For a random (x, … 𝑥𝑛 ) of n observation let the observed frequencies in the 𝓀 classes be 𝑜1 , 𝑜2,
𝑜𝑛 (∑𝑛𝑖 𝑜𝑖 = 𝑛)and the expected frequencies under the 𝐻𝑜 be 𝑒1 , 𝑒2 , … … . 𝑒𝓀 (∑𝑛𝑖 𝑒𝑖 = 𝑛) where 𝑒𝑖 =
𝑛þ𝑖 calculate
𝓀
(𝑜𝑖 − 𝑒𝑖 )2
𝛘𝟐 = ∑
𝑒𝑖
𝑖
Them, for large sample, 𝑥 2 has 𝑥 2 (𝓀 − 𝑖) the test of 𝐻𝑜 has the cr. region
2
𝑥 2 ⩾ 𝑥𝓀−𝑖,𝓀
1
Note: it we want to test 𝐻𝑜 þ1 = þ2 , … … . = þ𝑛 we take þ𝑜𝑖 = to any
𝓀
(3) Testing goodness of fit: given a sample (𝑥1 , . . 𝑥𝑛 ) of Observation on 𝑎. 𝑟. 𝑣 X arranged in the
form of a frequencies distribution having 𝓀 classes AI , … . . A𝓀 we would like to test the hypothesis
that distribution of X has a specified from with þ, 𝑑, 𝑓(𝑜𝑟 þ, 𝑚, 𝑓)𝑓𝑜 (𝑥, 𝜃) the parameter 𝜃 be a
simple one or a vector (𝑜𝑖 , … . . 𝜃ℯ )
frequencies under 𝐻𝑜 be 𝑒𝑖 , 𝑒2 , … . 𝑒𝑛 ∑𝓀
𝑖 𝑜𝑖 = 𝑛
𝓀 𝓀
(𝑜𝑖 − 𝑒𝑖))2 𝑜𝑖2
𝛘𝟐 = ∑ = ∑ −𝑛
𝑒𝑖 𝑒𝑖
𝑖 𝑖
Then, for large sample, 𝑥 2 has 𝑥 2 (𝓀 − 𝑖) the test of 𝐻𝑜 has the cr. Region
2
𝑥 2 ⩾ 𝑥𝓀−𝑖,∝
Note if 𝑟(𝑜𝑓 ℓ) parameters in 𝜃 are estimated from the sample then χ2 has χ2(𝓀 − 𝑟 − 𝑖)if any
expected frequency is lass then 5 we pool this class with the adjoining class and denote by𝓀 the
effective number ƪ classes after paroling
And 𝑒𝑖𝑗 = expected a=(𝑖𝑡ℎ 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝑗𝑡ℎ 𝑐𝑜𝑙𝑢𝑚 𝑡𝑜𝑡𝑎𝑙)n “ “ “under 𝐻𝑜
Calculate
(𝑜𝑖𝑗 −𝑒𝑖𝑗)2
χ2 = ∑𝓀 𝓀
𝑖=1 ∑𝑖=1 𝑒𝑖𝑗
66
𝓀 𝓀 2
𝑜𝑖𝑗
= ∑∑ −𝑛
𝑒𝑖𝑗
𝑖=1 𝑖=1
Where n=total frequency. Then 𝑥 2 has 𝑥 2 on(𝓀 − 𝑖 )𝑥 (ℓ − 𝑖 )𝑑. 𝑓 the test of 𝐻𝑜 has the cr. Region
2
𝑥 2 ⩾ 𝑥𝓀−𝑖,ℓ−𝑖,∝
𝐻𝑂 ∶ 𝑝1 , … 𝑝𝓀 =, . . 𝑝𝓀
Us 𝐻𝐼 : all correlation coefficients are not equal we use the friskers z-trans function of correlation
1 𝑖+𝑟 1 𝑖+𝑝
coefficients given by 𝑧 = 2 𝑙𝑜𝑔ℯ 𝑖+𝑟 , 𝑆 = 2 𝑙𝑜𝑔ℯ 𝑖−𝑝 so that
1
𝑧~𝑁 (𝑆, )
√𝑛 − 3
We calculate z1, z2,......z𝓀 corresponding to r1,r2,.......r𝓀 having sample size n1,n2,....n𝓀 and define
𝓀 𝓀
And 𝑥 2 = ∑𝓀
𝑖 (𝑛𝑖 − 3)(𝑧𝑖 − 𝑧̅)
2
Then χ2has χ2on (𝓀 − 𝑖 )𝑑. 𝑓 and the test of 𝐻𝑜 has cr. Region
χ2 ⩾ χ2(𝑛−𝑖),∝
1 1 + 𝜌∗
𝑧̅ = 𝑙𝑜𝑔𝑒
2 1 − 𝜌∗
Uses if t-distribution:
(i)Testing the mean of a single population: let (𝑥1 , … … . . 𝑥𝑛 ) be a sample of size n from a normal
population 𝑁(𝜇, 𝜎 2 ) and, as usual, 𝑥̅ and 𝑠 2 are the sample mean and sample variance. We would
like to let the null hypothesis 𝐻𝑜 : 𝜇 = 𝜇𝑜 against alterative 𝜇 > 𝜇𝑜 or 𝜇 < 𝜇𝑜 or 𝜇 ≠ 𝜇𝑜 the tests are
summarised as follows:
(2) Testing the equality of two population means: let (𝑥1 , … . . 𝑥𝑛2 ) and (𝓎1 , … … 𝓎𝑛2 ) be two
samples from in dept normal populations 𝑁(𝜇1 , 𝜎1 ) and 𝑁(𝜇2 , 𝜎2 ) respectively let 𝑥̅ , 𝓎̅, 𝑠12 , 𝑠22 be as
usual and let
67
We would like to test 𝐻𝑂 : 𝜇1 = 𝜇2 against alternative 𝜇1 < 𝜇2 or 𝜇1 ≠ 𝜇2 the test are summarised
as follows:
Case I
𝑥̅ −𝓎̅
𝐻𝑖 : 𝜇1 > 𝜇2 ⩾ 𝑧∝
𝜎 2𝜎 2
√{ 1+ 2}
𝑛1 𝑛2
𝐻𝑖 : 𝜇1 < 𝜇2 “ ⪕ −𝑧∝
[𝑥̅ −𝓎̅]
𝐻𝑖 : 𝜇1 ≠ 𝜇2 ⩾ 𝑧∝/2
𝜎 𝜎 2 2
√{ 1+ 2 }
𝑛1 𝑛2
(𝑥̅ − 𝓎̅ ) − (𝛿)
1 1
&√𝑛 + 𝑛
1 2
Uses of F-distribution:
Let two samples of sizes 𝑛1 and 𝑛2 be given from two independent normal population 𝑁(𝜇1 , 𝜎1 ) and
𝑁(𝜇2 , 𝜎2 ), respectively .Let 𝑠12 , 𝑠 22 be the two sample variance. We would like to test the null
hypothesis 𝐻𝑜 : 𝜎1 = 𝜎2 against𝐻𝑖 : 𝜎1 ≠ 𝜎2 The test are cr follows:
Case I 𝜇1 , 𝜇2 known
𝑛
∑ 1 (𝑥𝑖−𝜇1)2 𝑛1
Reject 𝐻𝑜 if either ∑𝑛𝑖=1 ⩾ 𝐹
2 (𝓎 −𝜇2)2
𝑖=1 𝑖
𝑛2 𝑛1 ,𝑛2 ,∝/2
𝑛2 (𝓎 −𝜇2)2
∑𝑖=1 𝑖 𝑛
Or 𝑛2 (𝑥 −𝜇1)2
∑𝑖=1
⩾ 𝑛2 𝐹𝑛1 ,𝑛2 ,∝/2
𝑖 1
Case II I 𝜇1 , 𝜇2 known
68
(𝑠1)2
Reject 𝐻𝑜 if either (𝑠2)1
⩾ 𝐹𝑛1 −1,𝑛2 −1,∝/2 If 𝑠1 > 𝑠2
(𝑠1)2
Or ⩾ 𝐹𝑛2 −1,𝑛1 −1,∝/2 If 𝑠2 > 𝑠1
(𝑠2)1
(2) Testing the multiple correlation coefficient: Given a sample of size or from a bivariate
normal population (𝑥1 , 𝑥2 , 𝑥3 ) with multiple correlation coefficient 𝑅1(23) of 𝑥1 𝑜𝑟(𝑥2 , 𝑥3 ) we would
like to test the null hypotheses 𝐻𝑂 𝑅1(23) = 0 let the sample multiple correlation coefficient be
𝑅1(23) . The test is to reject 𝐻𝑂 at level ∝ if
2
𝑟(23) 𝑛−3
2 . ⩾ 𝐹2,𝑛−3,∝
1 − 𝑟1(23) 2
(3) Testing the equality of means of 𝓀 normal distribution (𝓴 > 𝟐)[see left page]
1 1+𝑟
𝑧 = 2 𝑙𝑜𝑔𝑒 1−𝑟
Where r is a sample correlation coefficient Though the population correlation coefficient P may be
widely different from zero, the new statistics z may be amounted to be normally distributed even
when n is as small as 10 it has hen show that z has approximate mean
1 1+𝑝
𝜉 = 𝑙𝑜𝑔𝑒
2 1−𝑝
√𝑛 − 3(𝑧 − 𝜉)~𝑁(𝑜, 1)
√𝑛 − 3[𝑧 − 𝜉𝑜 ] ⩾ 𝑁∝/2
1 1+𝑝
Where 𝜉𝑜 = 𝑙𝑜𝑔𝑒 and 𝑁∝ is the appear ∝ % point of normal distribution 𝑁(𝑂, 1)
2 1−𝑝
1 1 + 𝑟𝑖
𝑧𝑖 = 𝑙𝑜𝑔𝑒 (𝑖 = 1,2)
2 1 − 𝑟𝑖
|𝑧1 − 𝑧2 |
⩾ 𝑁∝/2
1 1
√ +𝑛
𝑛1−3 2−3
69
∑𝓀
𝑖=1(𝑛𝑖 − 3)𝑧𝑖
𝑧̅ =
∑𝓀
𝑖=1(𝑛𝑖 − 3)
𝓀
2
∑(𝑛𝑖 − 3)(𝑧𝑖 − 𝑧̅)2 ⩾ 𝑥𝓀−1,∝
𝑖=1
√𝑃(𝐼 − 𝑃
þ~𝑁 (𝑝 )
𝑛
Large sample tests so for we have considered tests of hypothesis which contain assumptions
regarding the population are satisfied .Now we consider some approximate test which are valid only
for sufficiently large samples, but they have wide applicability and hold for all populations satisfying
certain general conditions rather than being valid for some particular populations only (e.g. normal )
[þ − 𝑃𝑜 ]
⩾ 𝑁∝/2
√𝑝𝑜 (1 − 𝑝𝑜 )/𝑛
(ii)Testing the equality of two population proportions: Let 𝑝1 , 𝑝2 be two population proportions and
þ1 , þ2 be the two sample proportions dream from there indecent population the test of 𝐻𝑜 : , 𝑃1 , 𝑃2 is to
reject 𝐻𝑜 at level ∝if
[þ1 − þ2 ]
≥ 𝑁∝/2
1 1
√þ(𝑖 − þ) { + }
𝑛1 𝑛2
Where
𝑛1 þ1 + 𝑛2 𝑛2
þ=
𝑛1 + 𝑛2
(iii)Testing for a st. deviation: let s be the st. Deviation of a sample of observation of size drown from a
population with st. Deviation 𝜎(x) the test of 𝐻𝑂 : 𝜎 = 𝜎𝑜 is to reject 𝐻𝑂 at level ∝if
[𝓈 − 𝜎𝑜 ]
⩾ 𝑁∝/2
𝜎𝑜 /√2𝑛
(iv)Testing for equality of two population st. Deviation Let 𝓈1 , 𝓈2 be the st. Deviation of two sample of
sprees 𝑛1 , 𝑛2 from two independent population with st. Deviation 𝜎1 , 𝜎2 Let
𝑛1 𝓈12 + 𝑛2 𝓈22
𝓈2 =
𝑛1 + 𝑛2
70
[𝓈1 − 𝓈2 ]
⩾ 𝑁∝/2
1 1
𝓈√2𝑛 + 2𝑛
1 2
Definition:- For a random sample (𝑥1 , … , 𝑥𝑛 ) from the distribution of a 𝑟. 𝑣. 𝑥 having þ, 𝑑, 𝑓 𝑓(𝑥, 𝜃) Let
𝐿1 𝐿1 (𝑥1 , … , 𝑥𝑛 )and 𝐿2 (𝑥1 , … , 𝑥𝑛 ) be two statistics such that 𝐿1 ≤ 𝐿2 . The interval [𝐿1 , 𝐿2 ] is a
confidence interval for 𝜃 with. Confidence coefficient 1−∝ (0 <∝< 1) if 𝑃𝜃 [𝐿1 ≤ 𝜃 ≤ 𝐿2 ] = 1−∝ for
all 𝜃 𝜖 𝛺 𝐿1 and 𝐿2 are called the lower and upper confidence limits, respectively at least one of them
should not be a constant.
Interval Estimation
Suppose 𝑎, 𝑟, 𝑣 x has Normal distribution 𝑁(𝜇, 𝜎) with unknown mean 𝜇 and known st. Deviation𝜎. Let
(𝑥𝑖 , … , 𝑥𝑛 ) be the values of a random sample of size or from then distribution .We know that the sample
𝜎 √𝑛(𝑥−𝜇)
mean 𝑥̅ ~𝑁 (𝜇, ) and, hence
𝜎
~𝑁(𝑜, 𝑖). It follows that
√𝑛
√𝑛(𝑥 − 𝜇)
𝑃 {−1.96 ⪕ ⪕ 1.96} = 0.95
𝜎
Or, equivalently,
𝜎
𝑃 {𝑋̅ − 1.96 ⪕ 𝜇 ⪕ 𝑋̅ + 1.96 } = 0.95
√𝑛
This shows that, in respected sampling the probability is 0.95 that the interval
𝜎 𝜎
{𝑋̅ − 1.96 ; 𝑋̅ + 1.96 }
√𝑛 √𝑛
Will include 𝜇, We say that above is a confidence interval for 𝜇 with confidence coefficient ,95. The
two end points are known as 95% confidence limits for𝜇.
Let us now consider the general problem Let 𝑎, 𝑟, 𝑣 x has distribution depending on an unknown
parameter 𝜃 which is to be estimated. Suppose Z is a statistics (usually it is a function of a sufficient
statistics if it exists) which is a function of 𝜃 but whose distribution does not depend on𝜃. Such a
statistics z is called a ploetal function Let 𝜆1 and 𝜆2 be two numbers such that
The above inequality can be solved such that it assumes the from
For all 𝜃 where 𝜃1 and 𝜃2 are random variables which do not depend on𝜃.Finally, if we astute the sample
value [𝜃1 ((𝑥1 , … , 𝑥𝑛 )), 𝜃2 ((𝑥1 , … , 𝑥𝑛 ))] becomes a confidence interval for 𝜃 with desired confidence
coefficient 1−∝.
Remark: the numbers 𝜆1 , 𝜆2 may be chosen in several ways, giving rise to several confidence intervals.
We usually choose confidence intervals of shortest length.
71
√𝑛(𝑥̅ − 𝜇)
𝐿𝑒𝑡 𝑧 =
𝜎
𝛼
Which has 𝑁(𝑂, 𝐼) distribution For a specified∝ let 𝑁∝/2 be the 2 % critical value of 𝑁(𝑜, 1)then
√𝑛(𝑥̅ − 𝜇)
𝑃 {−𝑁∝/2 ⪕ ⪕ 𝑁∝/2 } = 𝐼−∝
𝜎
𝜎 𝜎
Or 𝑃 {𝑥̅ − 𝑁∝/2 ⪕ 𝜇 ⪕ 𝑥̅ + 𝑁∝/2 } = 𝑖−∝
√𝑛 √𝑛
𝜎 𝜎
So that 𝑃 {𝑥̅ − 𝑁∝/2 𝑥̅ + 𝑁∝/2 }
√𝑛 √𝑛
√𝑛(𝑥̅ −𝜇) 𝑖
𝐿𝑒𝑡 𝑧 = 𝑠
where 𝑠 2 = 𝑖−1 ∑𝑛𝑖(𝑥 − 𝑥̅ )2
√𝑛(𝑥̅ − 𝜇)
𝑃 {𝑡𝑛−1,∝/2 ⪕ ⪕ 𝑡𝑛−1∝/2 } = 𝑖−∝
𝑠
𝑆 𝑆
Or 𝑃 {𝑋̅ − 𝑡𝑛−1,∝/2 ⪕ 𝜇̅ ⪕ 𝑋̅ + 𝑡𝑛−1,∝/2 } = 𝑖−∝
√𝑛 √𝑛
𝑆 𝑆
So that {𝑋̅ − 𝑡𝑛−1,∝/2 , 𝑋̅ + 𝑡𝑛−1,∝/2 }
√𝑛 √𝑛
2
∑(𝑥𝑖 − 𝜇)2 2
𝑃 {𝑋𝑛,𝑖−∝/2 ⪕ ⪕ 𝑋𝑛,𝑖−∝/2 } = 1−∝
𝜎2
(𝑛−1)𝑠 2 1
𝐿𝑒𝑡 𝑧 = 𝜎2
Where 𝑠 2 = 𝑛=1 ∑𝑛𝑖(𝑥1 − 𝑥̅ )2
2 (𝑛 − 1)𝑠 2 2
𝑃 {𝑋𝑛,𝑖−∝/2 ⪕ ⪕ 𝑋𝑛,𝑖−∝/2 } = 1−∝
𝜎2
(𝑛−1)𝑠 2 (𝑛−1)𝑠 2
Or 𝑃{ 2 ⪕ 𝜎2 ⪕ 2 } = 1−∝
𝑋𝑛,𝑖−∝/2 𝑋𝑛,𝑖−∝/2
(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
{ 2 , 2 }
𝑋𝑛,𝑖−∝/2 𝑋𝑛,𝑖−∝/2
𝐿𝑒𝑡 𝑧 = 2𝜆𝑛𝑥̅
2 2
𝑃{𝑋2𝑛,1−∝/2 ⪕ 2𝜆𝑛𝑥̅ ⪕ 𝑋2𝑛,1−∝/2 } = 𝑖−∝
2 2
𝑋2𝑛,1−∝/2 𝑋2𝑛,,∝/2
Or 𝑃{ 2𝑛𝑥̅
⪕, ⪕ 2𝑛𝑥̅
}
2 2
𝑋2𝑛,1−∝/2 𝑋2𝑛,,∝/2
{ ⪕, ⪕ }
2𝑛𝑥̅ 2𝑛𝑥̅
(6) Let X ~N(𝜇, 𝜎) and γ ~N(𝜇2 , 𝜎2 )where 𝜎1 = 𝜎2 (𝑢𝑛𝑘𝑜𝑤𝑛 ). We want a confidence for (𝜇1 − 𝜇2 )
1 1 1 1
Or 𝑃 {(𝑥̅ − 𝑦̅) − 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑠√ + ⪕ (𝜇1 − 𝜇2 ) ⪕ (𝑥̅ − 𝑦̅) + 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑥𝑠√ + } = 𝑖−∝
𝑛1 𝑛2 𝑛1 𝑛2
1 1 1 1
{(𝑥̅ − 𝑦̅) − 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑠√ + } (𝑥̅ − 𝑦̅) + 𝑡𝑛1 +𝑛2 −2,∝/2𝑠 𝑥𝑠√ +
𝑛1 𝑛2 𝑛1 𝑛2
(7)Let X ~N (𝜇1 , 𝜎1 ) and γ ~N (𝜇2 , 𝜎2 ) where 𝜇1 , 𝜇2 are unknown and it is requested to obtain a
𝜎2
confidence interval of 𝜎12
2
𝑆 2/𝜎 2
Let Z = 𝑆12/𝜎12 (𝑆12 > 𝑆22 )
2 2
Then
𝑆12 /𝜎12
𝑃 {𝐹𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 ⪕ ⪕ 𝐹𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 } = 𝑖−∝
𝑆22 /𝜎22
1 𝑆12 𝑆12
So that { , 𝐹𝑛1 −𝑖,𝑛2 −𝑖,𝑖−∝/2 }
𝐹𝑛1−𝑖,𝑛2−𝑖,𝑖−∝/2 𝑆22 𝑆22
𝜎2
Is a confidence interval of 𝜎12 with confidence coefficient 𝑖−∝
2
One many chose a confidence region for (𝜇, 𝜎) using the two relations
𝑠 𝑠
𝑃 {𝑥̅ − 𝑡𝑛−1,∝/2 ⪕ 𝜇 ⪕ 𝑥̅ − 𝑡𝑛−1,∝/2 } = 𝑖−∝
√𝑛 √𝑛
𝑠
Where 𝑡𝑎 = 𝑥̅ − 𝑡𝑛−𝑖,∝/2 𝑒𝑡𝑐
√𝑛
(𝑛 − 1)𝑠 2
𝑥𝑎 = 2
𝑥𝑛−1,∝/2
But it is difficult to find the probability of the sample to full in the shaded region (confidence region)
Alternatively, using the independence of 𝑥̅ and 𝑠 2 we chose the cofidence region by the help of relation
𝑥̅ − 𝜇
𝑃 {−𝑁∝1 /2 ⪕ ⪕ 𝑁∝1/2 } = 1 −∝1
𝜎/√𝑛
2 (𝑛−1)𝑠 2 2
A,d 𝑃 {𝑥𝑛−𝑖,∝/2 ⪕ 𝜎2
⪕ 𝑥𝑛−𝑖,∝/2 } = 1 −∝2
𝑥̅ − 𝜇 2 (𝑛 − 1)𝑠 2 2
𝑃 {𝑁∝1/2 ⪕ ⪕ 𝑁∝1 /2 , 𝑥𝑛−𝑖,∝/2 ⪕ ⪕ 𝑥𝑛−𝑖,∝/2 } = (1 −∝1 ), (1 −∝2 )
𝜎/√𝑛 𝜎2
Obtain the boundaris of the confidence .region without difficully this is shown by the shaded region
below
Where 𝑞 = 𝑁∝1 /2
2
𝑞1 = 𝑥𝑛−𝑖,𝑖,∝/2
𝑝−𝑝
~𝑁(𝑜, 1)
√𝑃(𝑖 − 𝑃)/𝑛
Or
𝑝−𝑝
~𝑁(𝑜, 1)
√𝑃(𝑖 − 𝑃)/𝑛
þ−þ
𝑃 {−𝑁∝/2 ≤ ≤ 𝑁∝/2 } = 1−∝
√þ(𝑖 − þ)/𝑛
þ(𝐼−þ) þ(𝐼−þ)
Or 𝑃 {þ − 𝑁∝2 √ 𝑛
⪕ þ + 𝑁∝/2 √ 𝑛
} = 1−∝
So that
þ(1 − þ) þ(1 − þ)
{þ − 𝑁∝/2 √ , þ + 𝑁∝/2 √ }
𝑛 𝑛
(II) For two sample we can similerly find a confidence interval for 𝑃1 , 𝑃2 as follows:
(þ1 , þ2 ) = (𝑃1 , 𝑃2 )
𝑃 {𝑁∝2 ⪕ ⪕ 𝑁∝/2 } = 1−∝
1 1
√[þ(𝐼 − þ) (𝑛 + 𝑛 )]
1 2
𝑛1 þ1+𝑛2 þ2
Where þ= 𝑛1 +𝑛2
1 1 1 1
So that {(þ1 , þ2 ) − 𝑁∝/2 √[þ(𝐼 − þ) (𝑛 + 𝑛 ) þ1 , þ2 ) − 𝑁∝/2 √[þ(𝐼 − þ) (𝑛 + 𝑛 )}
1 2 1 2
(iii) Let x be 𝑎, 𝑟, 𝑣 having mean 𝜇, variance 𝜎 2 and we want a confidence interval for 𝜎
𝑠−𝜎
𝑃 {−𝑁∝2 ⪕ ⪕ 𝑁∝2 } = 1−∝
𝑠/√2𝑛
𝑠 𝑠
Or 𝑃 {𝑠 − 𝑁∝2 ⪕ 𝜎 ⪕ 𝑠 + 𝑁∝2 } = 1−∝
√𝑛 √𝑛
𝑠 𝑠
Then 𝑃 {𝑠 − 𝑁∝2 , 𝑠 + 𝑁∝2 }
√𝑛 √𝑛
(iv) For two sample we an similerly find a cofidence interval for 𝜎1 − 𝜎2 as follows:
75
(𝑠1 − 𝑠2 ) − (𝜎1 − 𝜎2 )
𝑃 −𝑁∝/2 ⪕ ⪕ 𝑁∝/2 = 1−∝
1 1
𝑠√2𝑛 + 2𝑛
{ 1 2 }
So that
1 1 1 1
{(𝑠1 − 𝑠2 ) − 𝑁∝/2 𝑠√ + , (𝑠1 − 𝑠2 ) + 𝑁∝/2 𝑠√ + }
2𝑛1 2𝑛2 2𝑛1 2𝑛2
(v) Let (𝑥, 𝑦) have a bivanate normal distribution with coefficient P and me want to find a confidence
region for P.
1 1+𝑝
ξ= 2 𝑙𝑜𝑔𝑒 1−𝑝
1 1+𝑟
and 𝑧 = 2 𝑙𝑜𝑔𝑒 1−𝑟
𝑍−3
Then ~𝑁(𝑂, 𝐼)
1√𝑛−3
So that
1 1
Or 𝑃 {𝑍 − 𝑁∝/2 < 3 ⪕ 𝑍 + 𝑁∝/2 } = 𝐼−∝
√𝑛−3 √𝑛−3
So that
1 1
{𝑧 − 𝑁∝/2 , 𝑧 + 𝑁∝/2 }
√𝑛 − 3 √𝑛 − 3
Gives a (𝑖−∝)% confidence interval for ξ.From this we can earily obtain the corrponding confidence
interval for P.
NON-PARAMETRIC INFERENCE
In all problems of statictics inference considered so fan we assumed that the distribution of the
random variable breing sampled is know n except for some parameters . in pratice however the
functional from in the distribution is seldom if ever , known if is therefore desivable to devise some
produres that are free from this assumption concering distribution such produres are commonly
refered to as distribution free or non-parametric methods the term distribution free refers to the fact
that no assumptions are made about the underlying distribution execpt that the distribution function
being sampled is absolutely continuous or purely discrete. The term non-parametric refers to the
factors that there are no parameters involved in the traditional sense of the parameter used so for.
76
We will consider only the inferential problem of testing of hypothesis and dercribe a few
non-parametrictests
Single- sample problems : (a)The problem of fit : the problem of fit is to test the hypothesis that a
sample of obsevations (𝑥𝑖 , 𝑥𝑛 ) is from some specified distribution against the alternative that it is from
some other distribution.Thus we have to test
(i)Chi- square test: Let there be 𝓀 categories and let þ𝑖 be the probality of a random obsevation from
𝐹𝑜 (𝑥) to fall in the 𝑖𝑡ℎ category (𝑖 = 1,2, … . 𝑛).For a sample of size n, Let 𝑜𝑖 be the obsevarved freqnecy
in the 𝑖𝑡ℎ category and let 𝑒𝑖 = 𝑛þ𝑖 be the expected frequency in the 𝑖𝑡ℎ category under 𝐻𝑜 .
𝑛
2
(𝑜𝑖 − 𝑒𝑖)2
𝑥 =∑
𝑒𝑖
𝑖=1
The larger the value of 𝑥 2 the more likely it is that the 𝑜𝑖,𝑠 did not come from 𝐹𝑜 (𝑥). The 𝑥 2 −statistic
for large samples has a 𝑥 2 distribution on (𝓀 − 1)d.f .Thus an approximate level ∝ test is provided by
rejecting 𝐻𝑜 if
2
𝑥 2 > 𝑥𝓀−1∝,
(ii)Kolmogoror – Smironv one sample test : For the sample (𝑥𝑖 , … 𝑥𝑛 )let the empirical distribution
function 𝐹𝑛 (x) be given by
𝑜 𝑖𝑓 𝑥 < 𝑥(𝑖)
𝓀
𝐹𝑛 (𝑥) { ⁄𝑛 𝑖𝑓 𝑥(𝓀) ⪕ 𝑥 < 𝑥(𝓀−𝑖)
𝑖 𝑖𝑓 𝑥 ⩾ 𝑥(𝑛)
(𝓀 = 1,2, … 𝑛, −1) whese 𝑥(1) , 𝑥(2) , … . 𝑥(𝑛) are the order statistic , Evidently ,
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑥𝓀 , 𝑠 (𝐼, ⪕ 𝓀 ⪕ 𝑛) ⪕ 𝑥
𝐹𝑛𝑌 (𝑥) =
𝑛
For testing 𝐻𝑂 : 𝐹(𝑥) = 𝐹𝑜 (𝑥) against the two sided alternative 𝐻𝑖 : 𝐹(𝑥) ≠ 𝐹𝑜 (𝑥) we use the
Kolmogoror – Smironv statictic
𝑠𝑢𝑝 𝑌
𝐷𝑛 = [𝐹 (𝑥) − 𝐹𝑜 (𝑥)]
𝑥 𝑛
It can be shown that the K-S statistic 𝐷𝑛 is completely distribution free for any continouns distribution
𝐹𝑜 (𝑥)
𝐷𝑛 > 𝐷𝑛,∝
Remark1:For testing𝐻𝑂 : 𝐹(𝑥) = 𝐹𝑜 (𝑥) against one-sided alternatives 𝐻1 : 𝐹(𝑥) > 𝐹𝑜 (𝑥) or 𝐻2 : 𝐹(𝑥) <
𝐹𝑜 (𝑥) based on one-sided K.S statistics 𝐷𝑛+ and 𝐷𝑛− are also available
Remark 2: For small sample 𝑥 2 −test is not available but K.S test can be applied. For discrete distibution
K.Stest is not availible but 𝑥 2 −test can be appled K.S test is more powerful then 𝑥 2 −test.
77
(B) The problem of Location: Let (𝑥𝑖 , … . 𝑥𝑛 ) be a radom sample from a distribution 𝐹(𝑥) with unknown
median ξ ,where 𝐹(𝑥) is assumed to be continus in the neigbourhood of ξ. By definition of median
1
(𝑃 (𝑥 ⩾ 𝜉 ) = 2 .We would like to test the hypothesis
𝑅−𝑛/2
We take 𝑛/4
~𝑁(𝑜, 𝑖)
Sign Test: We from the n differences (𝑥𝑖 − 𝜉𝑂 )𝐼 = 1,2 … … … 𝑛 and find out the number, R,of position
differences (differences having postive signs ) 𝑖, 𝑒 when (𝑥𝑖 − 𝜉𝑂 ) > 𝑜.
1 1
If 𝐻𝑂 is true, 𝑃(𝑋𝑖 − 𝜉𝑂 ⩾ 𝑂) = 2 , 𝑖 = 1,2, … . 𝑛 and R has a Biomial distribution with paramer2 . We
may use an exect test of𝐻𝑂 based on the Biomial Distribution. In the case of one-sided alternative
𝐻𝑖 : 𝜉 > 𝜉𝑜
The sample will have an excess of positive signs and in the case of
𝐻𝑖 : 𝜉 > 𝜉𝑜
The critical values 𝑅1∝ , 𝑅2∝ , 𝑅∝/2 , 𝑅∝/2 are calculate from tables of Biomaial distribution
Rajred –sample signs test: Here we assume that we have a random sample of n pains (𝑥𝑛 , 𝑥𝑛 ) giving the
the differences
𝐷𝑖 = 𝑥𝑖 − 𝑦𝑖 , 𝑖 = 1, … 𝑛
We have , now a single sample 𝐷𝐼 , … . . 𝐷𝑛 and we can test 𝐻𝑜 : ξ = 𝜉𝑜 which can be taken to be oby the
sign test descrited above.
Remark the above two sign test s are , repectively aralogoun to single sample 𝑡 − 𝑡𝑒𝑠𝑡 and paired t-
test for testing location of a normal distribution ,
Two sample problems : let (𝑥𝑖 , … … 𝑥𝑛 ) and (𝛾𝑖 , … … 𝛾𝑛 ) be independent random sample s from two
absolutely continous distribution 𝐹𝑥 (𝑥) and 𝐹𝛾 (𝓎) , respectively
Run test(Wald –Wolfowitz): we assarge the m, x’s and n 𝛾′𝑠 in increasing order of size
𝑋𝑌𝑌𝑋𝑋𝑌𝑌𝑌𝑋𝑌and count the numbers of runs .if 𝐻𝑜 is true the (m+n) values will be well mixed up and
we expect that R, the total number of runs , will be relatively large. But R will be small if the samples
come from differernt popaltions 𝑖, 𝑒 𝐻𝑜 is false in the extreme case , if all the value of y are greater than
all the value of x, or vice – vera , there will be only two runs
𝑅 ⪕ 𝑅∝
78
𝑃(𝑅 ⪕ 𝑅∝ /𝐻𝑜 ) ⪕∝
Tables of critical values of R based on above have been given by swed and Eisenhant
For large m,n(both greater then 10), Ris asymptohcally Normally distributed with
2𝑚𝑛
𝐸 (𝑅) = +1
𝑚+𝑛
2𝑚𝑛(2𝑚𝑛−𝑚−𝑛)
And 𝑉(𝑅) =
(𝑚+𝑛)2(𝑚+𝑛−𝑖)
Median it test: We arrange the x’s and y’s in asscending order of size and find the median M of the
contied sample let
If V is large it is reasomable to conclude that the actual median of x is smaller than the median of Y
𝑖, 𝑒 𝐻𝑜 : 𝐹𝑥 (𝑥) = 𝐹𝑌 (𝑥) is respected
On the other hand , if V is too small it is reamable to condude that the actual median of X is greater
than the median of y 𝑖. 𝑒 𝐻𝑜 : 𝐹𝑥(𝑥) = 𝐹𝑦 (𝑥)is respected in fovoues of 𝐻𝑖 : 𝐹𝑥 (𝑥) < 𝐹𝑌 (𝑥)
For the two sided alternative , we use the two sided test .
𝑚 𝑛
( )(þ−𝑢)
𝑢
𝑃(𝑉 = 𝑢/𝐻𝑜 ) = 𝑚+𝑛 ,𝑢 = 𝑜, 1 … … . . , 𝑛
( )
þ
And
𝑚 𝑛
( )(þ−𝑢)
𝑢
𝑃(𝑉 = 𝑢/𝐻𝑜 ) = 𝑚+𝑛 , 𝑢, 1 … … . min(𝑚, þ)
( )
þ
Wilcoxon- Mann –Whitney U test: This is the most widely used two- sample non-parametric test and is
a useful alternative to the t-test assumotions.
The test is like the run test based on the pattern of 𝑚, 𝑥 ′ 𝑠 and 𝑛, 𝑦′𝑠 arranged in ascending order of
size . The Main- Whitney U statistic is defined as the number of times as X preades 𝑎 𝑌 In the combined
sample of size 𝑚 + 𝑛. We define
1, 𝑥𝑖 < 𝑦𝑗 𝑖 = 1, … … 𝑚
𝑧𝑖𝑗 = ( ( )
0 , 𝑥𝑖 > 𝑦𝑗 𝑗 = 1 … … . 𝑛
79
And write
𝑚 𝑛
𝑈 = ∑ ∑ 𝑧𝑖𝑗
𝑖=1 𝑖=1
Note that ∑𝑚 𝑖=1 𝑧𝑖𝑗 is the number of 𝑦𝑗′𝑠 that are larger than 𝑥𝑖 and hence U is the number of values of
𝑥𝑖 , … … … 𝑥𝑛 that are smaller than each of 𝑦𝑖 , … … … , 𝑦𝑛 . For example , suppose the contined sample
when ordered is as follows :
Then U=7, becouse there are three values of X<𝑌1 , two values of X<𝑌2 and two values of X<𝑌3
It is obseved that U=0 if all the𝑥𝑖 ′𝑠 are larger than all 𝑦𝑖 ′𝑠 and U=mn of all the 𝑥𝑖 ′𝑠 are smaller than all
the 𝑦𝑖 ′𝑠. Thus 𝑜 ⪕ 𝑈 ⪕ 𝑚𝑛. If U is large the values of y tend to be larger than X (Y is stochastically larger
than X) and this supposts the alternative 𝐹𝑥 (𝑥) > 𝐹𝛾 (𝑥). Similarly, if U is small, the values of Y tend to
be smaller than X and this supposts the alternative 𝐹𝑥 (𝑥) > 𝐹𝛾 (𝑥).
𝐻𝑜 𝐻𝑖 𝑅𝑒𝑗𝑒𝑐𝑡𝐻𝑜 𝑖𝑓
𝑚𝑛
𝐸(𝑈) =
2
𝑚𝑛(𝑚+𝑛+1)
And 𝑉(𝑈) =
12
The tables of distribution of U for small samples are given by table and Mann-Whitney. For large
samples U has asymptotic normal distribution,𝑖, 𝑒
𝑚𝑛
𝑈− 2
~𝑁(𝑂, 𝐼)
√ 𝑚𝑛(𝑚 + 𝑛 + 1)
12
APPENDIX
Therom: suppose Xis a continuous 𝑟, 𝑢 with þ, 𝑑, 𝑓 𝑓𝑥 (𝑥). Set 𝑥 = {𝑥,𝑓𝑥 (𝑥) > 𝑜}.Let
(ii) the derivative of 𝑥 = ℊ−1 (𝑥) 𝜔. 𝑟. 𝑡 𝓎 is continous and non-zero for 𝓎 𝜖 𝑥, where ℊ−1 (𝓎 ) is the
inverse for of 𝓎 (𝑥) 𝑖, 𝑒 ℊ−1 (𝓎 ) isthat 𝑥 for which ℊ(𝑥) = 𝓎
𝑑
𝑓𝑦 (𝓎) = 𝑓𝑥 (ℊ−1 (𝓎 )) [ ℊ−1 (𝓎 )]
𝑑𝓎
80
Therom : let 𝑥1 and 𝑥2 be jointly continous 𝑟. 𝑢. 𝑠 with þ, 𝑑, 𝑓 𝑓𝑥1 ,𝑥2 (𝑥1 , 𝑥2 ). Set 𝑥 =
{(𝑥1 , 𝑥2 ): 𝑓((𝑥1 , 𝑥2 ) > 𝑜}Assumu that
(ii)The first partical of derivatives of 𝑥1 = ℊ𝑖−1 (𝓎1 𝓎2 ) and 𝑥2 = ℊ𝑖−1 (𝓎1 𝓎1 ) are continous over x.
(iii) The jacebian of transformation is non-zero for (𝓎1 𝓎1 )𝜖𝑥.Then the joint þ, 𝑑, 𝑓 of 𝛾1 =
ℊ, (𝑥1 , 𝑥2 )and 𝛾2 = ℊ, (𝑥1 , 𝑥2 )is given by
Where
∝ 𝑥𝑖 ∝ 𝑥1
∝ 𝓎𝑖 ∝ 𝓎2
𝐼𝐽𝐼 = [ ∝ 𝑥 ∝ 𝑥 ]
2 2
∝ 𝓎1 ∝ 𝓎2
X2- distribution
Definition : A continous 𝑟, 𝑢, 𝑥 is said to have the X2- distribution on n degrees of freedom if its þ, 𝑑, 𝑓 is
given by
1 𝑛
𝑓 (𝑥 ) = 𝑥 2 −1 𝑒 −𝑥/2 , 𝑥 ⩾ 𝑜
𝑥 𝑛/2 1(𝑛⁄2)
=𝑜 𝑥<𝑜
The 𝑚, ℊ, 𝑓 of x is given by
𝑀𝑥 (𝑡) = 𝐸𝑒 𝑡𝑥
∞
1 𝑛
= ∫ 𝑥 2 −1 𝑒 𝑥(1−2𝑡)/2𝑑𝑥
𝑥 𝑛/2 1(𝑛⁄2) 𝑜
1 1(𝑛⁄2)
=
𝑥 𝑛/2 1(𝑛⁄2) (1 − 2𝑡 ) 𝑛/2
2
= (1 − 2𝑏)−𝑛/2
For 𝑛 ⪕ 2the þ, 𝑑, 𝑓 of 𝑥 2 (𝑛) steadily dencress as 𝑥 iscrese while for 𝑛 > 2 there is a uniqne maximum
at 𝑥 = 𝑛 − 2
2
𝑀𝑥2 = 𝐸(𝑒 𝑡𝑥 )
∞
1 2
= ∫ 𝑒 𝑡𝑥 − 𝑥 2/2 𝑑𝑥
√2𝜋 −∞
∞
1 2
= ∫ 𝑒 𝑡𝑥 − 𝑥 2(1−2𝑡)/2 𝑑𝑥
√2𝜋 −∞
81
√2𝜋 1
=
√1 − 2𝑡) √2𝜋
= (1 − 2𝑡)−1/2
Then 𝑧 = ∑𝓀 2
𝑖 𝛾𝑖 ~𝑥 (𝑛1 + 𝑛2 +. . +𝑛𝓀 )
Proof :the 𝑚, ℊ, 𝑓 Z
𝑀𝑍 (1) = 𝐸𝑒 𝑡𝑧
𝓀
𝑡
= 𝐸𝑒 ∑ 𝑌𝑒
𝑖
= ∏ 𝐸(𝑒 𝑡𝑦𝑒 )
𝑖=1
= (1 − 2𝑡)−(𝑛𝑖 +..+𝑛𝓀 ) /2
(𝑥𝑖−𝜇)2
Crollanj : Let (𝑥𝑖 , … . . 𝑥𝑛 )be a random simple from a Normal distributuion 𝑁(𝜇, 𝜎).Then ∑𝑛𝑖=1 𝜎2
has
2
𝑥 distribution on 𝑛, 𝑑, 𝑓.
Therom: Let (𝑥𝑖 , … . . 𝑥𝑛 )be a random simple from a Normal distributuion 𝑁(𝜇, 𝜎) Let 𝑥̅ = ∑𝑛𝑖 𝑥𝑖 /𝑛
1 (𝑛−𝑖)𝑠 2
And 𝑠 2 = ∑𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 be the sample mean and sample variance. Then has 𝑥 2 distribution on
𝑛−𝑖 𝜎2
(𝑛 − 𝑖 )𝑑, 𝑓.
Therom: For large 𝑛, √2𝑥 2 can be shown to be approximately normally distributred with mean √2𝑛 − 1
and st-dearation unity.
Therom: Assume that y has distribution function 𝐹𝑌 which satifies some regularity conditions ad which
has r-unknown parameters 𝜃1 , 𝜃2 … . 𝜃𝑟 and that (𝑦𝑖 , . . 𝑦𝑛 ) is a random sample of y.Let 𝜃̂𝑖 , 𝜃
̂𝑟 be the
𝑚. ℓ, 𝑒 of 𝜃′𝑠 .Suppose the sample is distribution in 𝓀 non-orerlapping intervals {𝐼𝐽 }
where 𝐼𝐽 = {𝓎: 𝑎𝑗−𝑖 < 𝑦 < 𝑎𝑗−𝑖 }, 𝑗 = 1, … 𝓀(𝑎𝑜 = −∞𝑎𝓀 = ∞and . Let 𝑥𝑖 , … . . 𝑥𝓀 be the number of
sample values falling in these inervals, respectively if me define
2
𝓀 (𝑥𝑗−𝑛𝑝
̂)
𝑗
Where 𝜃̂𝑖 , 𝜃̂
𝓀 replace 𝜃𝑖 , 𝜃𝓀 in 𝐹𝑦 ,then the distribution of the statistics 𝑧 = ∑𝑗=1 ̂
𝑛𝑃
Lerger is
𝑗
2
appoximately distributed as 𝑥 on 𝓀 − 𝑟 − 𝑖 𝑑, 𝑓 as n gets
Students t-distribution
𝑛+1
[( 2 ) 1
𝑓 (𝑥) 𝑛 2 𝑛+1 , −∞ < 𝑥 < ∞
[( 2) √𝑛𝜋 (1 + 𝑥 ) 2
𝑛
1 1
𝑓(𝑥) = , −∞ < 𝑥 < ∞
𝜋 𝑖 + 𝑥2
Which shows that it is a couchy distribution We will therefore, assume that 𝑛 > 𝑖
Remark:the þ, 𝑑, 𝑓 of t-distribution is symmctric about again. For large n the t-distribution tends to
Normal distribution. For small n hawever t-distribution deviates considerally from the normal in fact if
𝑇~𝑡(𝑛) and 𝑧~𝑁(𝑜, 𝑖)
For 2r<n
𝜇2𝑟 = 𝐸 (𝑋 2𝑟 )
𝑛+1
2[( 2 ) ∞ 𝑋 2𝑟
= 𝑛 ∫ 𝑑𝑥
[( 2) √𝑛𝜋 𝑜 (1 + 𝑥 2𝑟 ) 𝑛 + 1
𝑛 2
𝑥
Therom : Let 𝑥~𝑁(𝑜, 1) and 𝑦~𝑥 2 (𝑛) and Let 𝑥and 𝑦 be independent .Then 𝑈 =
√𝑦/𝑛