Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability
Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability
Note: This document has not been checked for errors and typos. It is only used
as a reference for class preparation before each lecture. The actual notes from
each lecture supersede this document and serve as the final materials for this
course.
Chapter 0: Review of Probability
Review of factorization criteria for independence of random variables
Let X be a random variable (rv) or a vector of rvs.
cumulative distribution function (CDF) of X (discrete and continuous);
(complement of CDF: CCDF)
probability mass function (pmf) of X (discrete);
probability density function (pdf) of X (continuous, also used for discrete);
moment generating function (mgf) of X (discrete and continuous).
Definition 0.1: Jointly distributed random variables X 1 , X 2 , , X n are independent if, and only if (iff),
all events of the form X 1 A1 , X 2 A2 , , X n An are independent.
Definition 1 leads to the following equivalent criteria for independence:
n n
E gi X i E gi X i , (5)
i 1 i 1
for all functions gi for which all the expectations in (5) exist.
Remark 0.1: Suppose X 1 , X 2 , , X n are independent. Choose gi(x) x in (5) to obtain
n n
E X i E X i . (6)
i 1 i 1
The converse is not true, however. Specifically, (6) does not imply independence. It implies only that
X 1 , X 2 , , X n are pairwise uncorrelated. For most standard multivariate distributions (6) does imply
that X 1 , X 2 , , X n are independent.
Example 0.1: Consider discrete random variable x with pmf
0.25, x 2
0.25, x 1
PX ( x)
0.25, x 1
0.25, x2
Let Y = X . It is clear that Y X . However,
2
Cov( X , Y ) E[ XY ] E[ X ]E[Y ]
E[ X 3 ] E[ X ]E[ X 2 ]
1 1
(8 1 1 8) (2 1 1 2)(4 1 1 4)
4 16
0
So X and Y are not independent but uncorrelated.
Example 0.2: Consider the uniform distribution on a disk of radius 1 centered at the origin, which has
pdf:
1
f X ,Y x, y , x 2 y 2 1. (7)
Remark 0.2: An equivalent necessary and sufficient condition for independence is
n
f X1 , X 2 ,, X n x1 , x2 , , xn hi ( xi ). (8)
i 1
and the support set x ,..., x f
1 n X1 , X 2 ,, X n x1 , x2 ,, xn 0 is a Cartesian product, for example,
x1 (a1 , b1 ),..., xn ( an , bn ) .
Review of moment generating functions
Definition 0.2: The moment generating function (mgf) of X is defined by
(9)
which exists only if the indicated expectation exists for t in a neighborhood of 0.
Three usages of the moment generating functions include
(1) Find moments of random
(2) Identify pdf’s of random variables (MGF is unique).
(3) Establish properties of random variables.
Note: In Definition 0.2, if the moment generating function of X is finite (exist) in a neighborhood of 0 (an
open interval about 0), then this function completely determines the distribution of X.
Theorem 0.2: Mgfs are unique, i.e., M X M Y iff FX FY .
Theorem 0.3: If MX exists, then moments of all orders exist and
E X r tr
M X t (10)
r 0 r!
d M X t
r
and EX r |t 0 , r 1, 2, (11)
dt r
Example 0.3: X geometric(p) if
p X x p (1 p ) x 1 , x 1, 2, , 0 p 1.
A. Find the mgf of X.
B. Use the mgf to find E(X) and Var(X).
C. Find FX(x).
Example 0.4: A kind of capacitor has a defect rate of 100 ppm (parts per million). Assume that whether
one is defective is independent of whether any other is.
A. What is the expected number of capacitors that would have to be inspected in order to find one
defective?
B. What is the probability that a defective capacitor would be found at or before the 10,000th
inspection?
M Y t ebt M X at . (12)
1 2
Example 0.5: Show that (1) if X ~ N 0,1 , then M X t e
t
and (2) if Y ~ N , , then
2 2
t 2t 2
M Y t e 2
. Use Theorem 4 to show that Z aY b ~ N a b, a .
2 2
Solution:
Definition 0.3: The joint mgf of jointly distributed random variables X 1 , X 2 , , X n is defined by
(13)
or in equivalent vector form
(14)
The joint mgf is said to exist iff the expectation in (13) and (14) exists for any t in a neighborhood of 0.
Theorem 0.5: As in the univariate case, joint mgfs are unique.
Theorem 0.6: For r n,
r s M X 1 , X 2 t1 , t2
EX X r
1
s
2 | t1 , t2 0,0
. (16)
t1r t2s
Theorem 0.8: If X 1 , X 2 , , X n are independent, then
n
M n t M X (t ).
Xi
i
i 1
i 1
constants with X i ~ N i , i , then
2
n
n n
2 2
i 1
a X
i i ~ N i i ai i .
i 1
a ,
i 1
The gamma and chi‐square distributions
Definition 0.4: The gamma function is defined by
. (18)
Some properties of the gamma function:
1. () does not exist in closed form unless is a positive integer in which case (n) (n 1 )!.
2. Recursive property: ( 1) ( ).
1
3. , which combined with the recursive property gives
2
1 1 3 5 2n 1
n for any positive integer n.
2 2n
Definition 0.5: X has a gamma distribution with shape parameter and scale parameter [write
X ~ , ] if
x
1
fX x x 1e , x 0, , 0. (19)
Note: The text uses , .
Remarks:
1. The exponential distribution is a special case of the gamma distribution; specifically
1, exp .
2. The gamma CDF cannot be expressed in closed form in general. Computer packages can give
numerical values. When n, a positive integer, it can be shown that (P.113):
k
x
.
n 1
x
FX x 1 e
1 FY n 1 , where Y ~ Poisson x (20)
k 0 k!
Theorem 0.9a: If X ~ , ,
E X , Var 2 .
Theorem 0.9b: If X ~ , ,
M X t 1 t .
Theorem 0.10: If X ~ , , then
r r
EX r
,
n
n
i 1
X i ~ i , .
i 1
X
i 1
i ~ n, .
Example 0.8: Suppose a machine has exponential life with mean time to failure (MTTF) 300h. When it
breaks down assume that it is immediately replaced by an identical machine or repaired to its previous
state. Also, assume that the times between breakdowns are independent.
A. What is the distribution to the fifth breakdown?
B. Assuming that the machines operate continuously, what is the probability that the fifth
breakdown occurs after one month (1 mo. 720h)?
Chi‐Square Distribution:
Definition 0.6: X has a chi‐square distribution with degrees of freedom (dof) (write
X ~ 2 ) if X ~ , 2 , i.e., if
2
1 1
x
fX x x 2 e 2 , x 0, 0.
2 2
2
Note: is usually an integer in the context of the chi‐square distribution.
Since the chi‐square is a special case of the gamma distribution, Corollaries 0.3 – 0.5 below follow
readily from the theorems on the gamma distribution.
Corollary 0.3: If X ~ 2 ,
E X , Var X 2 .
Corollary 0.4: If X ~ 2 ,
M X t 1 2t 2
.
Corollary 0.5: If X ~ , , then 2 X
~ 2 .
2
X i ~ 2n .
i 1 i
i 1
Proof: (By MGF)
Proof: (By MGF)
Corollary 0.7: If X 1 , X 2 , , X n are independent with X i ~ N i , i , then
2
2
n
X i i
i 1 i
~ n .
2
Example 0.9: Redo Example 0.8.B using the2 distribution.
(Assuming that the machines operate continuously, what is the probability that the fifth breakdown
occurs after one month (1 mo. 720h)? )
Summary:
Transformation Method
Any real‐value function of a rv X is itself a rv, the distribution of which is determined by the distribution
of X.
Three main techniques: 𝑌 𝑢 𝑥
1) CDF Approach
Sol: ∴ 𝐹 𝑥 1 𝑒 ,0 𝑥 ∞
∴𝐹 𝑦 𝑃𝑌 𝑦 𝑃𝑒 𝑦
1
𝑃 𝑥 ln 𝑦 ⇒ 𝐶𝐷𝐹 𝑜𝑓 𝑋
𝑏
1
𝐹 ln 𝑦
𝑏
1 𝑒
1 𝑦
𝑑𝐹 𝑦 𝑎
∴ 𝑓 𝑦 𝑦 ≡ 𝑐𝑦 ,1 𝑦 ∞
𝑑𝑦 𝑏
Define 𝐴 𝑥|𝑢 𝑥 𝑦
The CDF approach is to find
𝐹 𝑦 𝑃𝑢 𝑥 𝑦 𝑏𝑦 𝑠𝑜𝑙𝑣𝑖𝑛𝑔 𝐹 𝑦 𝐹 𝑢 𝑦
Thm 0.14: Let 𝑋 𝑥 , … , 𝑥 be a k‐dimensional vector of continuous r.v.’s with joint pdf
𝑓 𝑥 ,…,𝑥
If 𝑌 𝑢 𝑋 , then
𝐹 𝑦 𝑃𝑢 𝑋 𝑦 … 𝑓 𝑥 , … , 𝑥 𝑑𝑥 ⋯ 𝑑𝑥 ,
𝐴 𝑥|𝑢 𝑥 𝑦
The relationship between x and y can be:
❶ 𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒
1) one‐to‐one
❷ 𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
❸ 𝑛 𝑡𝑜 𝑜𝑛𝑒 ∶ 𝑥 𝑌
2) Not one‐to‐one < 𝑜𝑛𝑒 𝑡𝑜 𝑛 → 𝑁𝑜𝑡 𝑤𝑒𝑙𝑙 𝑑𝑒𝑓𝑖𝑛𝑒𝑑
❹ 𝑛! 𝑡𝑜 𝑜𝑛𝑒
❶ one‐to‐one (discrete)
Thm 0.15: If X is a discrete rv with pmf 𝑓 𝑥 , 𝑌 𝑢 𝑥 is one‐to‐one
i.e., 𝑦 𝑢 𝑥 ⇔𝑥 𝑢 𝑦 𝑤 𝑦 , then the pmf of Y is:
𝑓 𝑦 𝑓 𝑤 𝑦 , 𝑦 ∈ 𝐵, 𝐵 𝑦|𝑓 𝑦 0
Ex 0.12: Suppose 𝑋~𝐵𝐼𝑁 𝑛, 𝑝 . Find pmf of 𝑌 𝑛 𝑥
𝑛
Sol: We know 𝑃 𝑥 𝑝 1 𝑝
𝑥
Find the invert function 𝑤 ∙ : 𝑌 𝑛 𝑥⇒𝑥 𝑛 𝑌 𝑤 𝑌
𝑛 𝑛
∴𝑃 𝑦 𝑃 𝑤 𝑦 𝑛 𝑦 𝑝 1 𝑝 ⏞ 𝑦 𝑞 1 𝑞
❷ one‐to‐one (continuous)
Thm 0.16: If X is continuous rv with pdf 𝑓 𝑥 , 𝑌 𝑢 𝑥 is one‐to‐one from
𝐴 𝑥|𝑓 𝑥 0 𝑡𝑜 𝐵 𝑦|𝑓 𝑦 0
𝑑𝑤 𝑦
If the derivative is continuous and non zero on B, then the pdf of Y is:
𝑑𝑦
𝑑𝑤 𝑦
𝑓 𝑦 𝑓 𝑤 𝑦 ,𝑦 ∈ 𝐵
𝑑𝑦
𝑎
𝑎 1
Ex 0.13: In Ex 0.10, 𝑓 𝑥 𝑎𝑒 ,𝑌 𝑒 , 𝑓𝑌 𝑦 𝑦 𝑏 . Now, find 𝑓 𝑦 using the
𝑏
transformation method.
Sol: STEP 1: Find the inverse 𝑥 𝑤 𝑦 ln 𝑦
STEP 2: Take the derivative
STEP 3: 𝑓 𝑦 𝑓 𝑤 𝑦 𝑎𝑒 𝑦 #
❸ Not one‐to‐one:
𝑓 𝑦 𝑓 𝑤 𝑦 𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑑
𝑓 𝑦 𝑓 𝑤 𝑦 𝑤 𝑦 𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
𝑑𝑦
4
𝑓 0 𝑓 𝑤 𝑦 | 𝑓 𝑦
31
𝑓 1 ⏞ 𝑓 𝑤 1 𝑓 𝑤 1
𝑓 1 𝑓 1
4 1 4 1
31 2 31 2
10
31
𝑓 2 𝑓 𝑤 2 𝑓 𝑤 2 𝑓 2 𝑓 2 #
Joint Transformations:
Thm 0.17: If X is a vector of rv’s with joint pdf 𝑓 𝑥 and
𝑌 𝑌 ,…,𝑌 𝑈 𝑋 𝑢 𝑋 , … , 𝑢 𝑋 defines a one‐to‐one transformation, then the
joint pdf of Y is: (Jacobian is | 𝐽 |)
𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠: 𝐹 𝑦 , … , 𝑦 𝑓 𝑥 ,…,𝑥 ∗ | 𝐽 |
Where 𝑥 , 𝑥 , … , 𝑥 are the solution of 𝑌 𝑈 𝑋 .
𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒: 𝑓 𝑦 , … , 𝑦 𝑓 𝑥 ,…,𝑥
𝑌 𝑋
Define 𝑥 ,𝑥 ∈𝐴 𝑥 ,𝑥 | 0 𝑥 , 0 𝑥
𝑌 𝑋 𝑋
0 𝑥 𝑦 𝑢 𝑦 ,𝑦 0 𝑦 0 𝑦
1) ⇒ ⇒
0 𝑥 𝑦 𝑦 𝑢 𝑦 ,𝑦 0 𝑦 𝑦 𝑦 𝑦
2) 𝑓 , 𝑦 ,𝑦 𝑓 , 𝑥 ,𝑥 | 𝐽 | 𝑓 , 𝑦 ,𝑦 𝑦
𝜕𝑥 𝜕𝑥
𝜕𝑦 𝜕𝑦 1 0
|𝐽| 0
𝜕𝑥 𝜕𝑥 1 1
𝜕𝑦 𝜕𝑦
𝑓 𝑦 ∙𝑓 𝑦 𝑦 ∙ 1
𝑒 ∙𝑒
𝑒
Ex 0.16:
Sol:
a) ∴ 𝑤𝑖𝑡ℎ𝑖𝑛 0 𝑥 1, 𝑌 𝑋 𝑖𝑠 1 𝑡𝑜 1
𝑑𝑤 𝑦 𝑑 𝑦 1
∴𝑓 𝑦 𝑓 𝑤 𝑦 𝑓 𝑦 2 𝑦 1
𝑑𝑦 𝑑𝑦 2 𝑦
𝑁𝑜𝑡𝑒: 𝑌 𝑋 ⇒ 𝑋 𝑦⇒𝑋 𝑦 𝑤 𝑦
b) ∴ 1 𝑥 1, 𝑌 𝑋 𝑖𝑠 2 𝑡𝑜 1
1 𝑥 2, 𝑌 𝑋 𝑖𝑠 1 𝑡𝑜 1
Split 1 𝑥 1 to:
𝑑 𝑦 1 𝑦⎫
1 𝑥 0∶𝑥 𝑦 𝑤 ⇒𝑓 𝑦 𝑓 𝑦 𝑦
𝑑𝑦 3 6⎪
0 𝑦 1
𝑑 𝑦 1 1 𝑦 ⎬
0 𝑥 1∶𝑥 𝑦 𝑤 𝑦 ⇒𝑓 𝑦 𝑓 𝑦 𝑦 ⎪
𝑑𝑦 3 2 𝑦 6 ⎭
𝑑 𝑦 𝑦
1 𝑥 2: 𝑥 𝑦 𝑤 𝑦 ⇒𝑓 𝑦 𝑓 𝑦 ⋯ 1 𝑦 4
𝑑𝑦 6
⎧ 𝑦 𝑦 ⎧ 𝑦
⎪6 6 ⎪3 , 0 𝑦 1
∴𝑓 𝑦
⎨ 𝑦 ⎨ 𝑦
⎪ ⎪ , 1 𝑦 4
⎩ 6 ⎩6
Part b) can also be solved using CDF approach:
When 1 𝑥 1: 𝐹 𝑦 𝑃𝑌 𝑦 𝑃𝑋 𝑦 𝑃 √𝑦 𝑥 √𝑦
𝐹 𝑦 𝐹 𝑦
𝑑𝐹 𝑦 𝑑𝐹 𝑦 𝑑 𝑦 𝑑𝐹 𝑦 𝑑 𝑦
∴ 𝑓 𝑦 ∙ ∙
𝑑𝑦 𝑑𝑦 𝑑𝑦 𝑑𝑦 𝑑𝑦
When 1 𝑥 2: 𝐹 𝑦 𝑃𝑌 𝑦 𝑃𝑋 𝑦 𝑃𝑋 𝑦 𝐹 𝑦
𝑑𝐹 𝑦 𝑑𝐹 𝑦 𝑑 𝑦
∴ 𝑓 𝑦 ∙
𝑑𝑦 𝑑𝑦 𝑑𝑦
#
Sol: Idea:
⎧ 𝑑𝑣 , 0 𝑢 1
⎪
⎨
⎪ 𝑑𝑣 , 1 𝑢 2
⎩
𝑢, 0 𝑢 1
#
2 𝑢, 1 𝑢 2
Summary:
*** One‐to‐one: 𝑌 𝑈 𝑋 , 𝑋 𝑈 𝑌 𝑤 𝑌 , 𝐴 𝑥 |𝑓 𝑥 0, 𝐵 𝑦|𝑓 𝑦 0
Discrete: 𝑓 𝑦 𝑓 𝑤 𝑦 , 𝑦 ∈ 𝐵
Continuous: 𝑓 𝑦 𝑓 𝑤 𝑦 , 𝑦 ∈ 𝐵
*** Not one‐to‐one: Partition A into disjoint subsets: 𝐴 , 𝐴 , … , 𝐴
Discrete: 𝑓 𝑦 ∑ 𝑓 𝑤 𝑦
Continuous: 𝑓 𝑦 ∑ 𝑓 𝑤 𝑦 𝑤 𝑦
*** Joint Distribution:
Discrete: If not one‐to‐one, then partition A into 𝐴 , 𝐴 , … , 𝐴 s.t.
𝑓 𝑦 ,…,𝑦 𝑓 𝑥 ,…,𝑥
⎡ ⋯ ⎤
Continuous: 𝑓 𝑦 , … , 𝑦 𝑓 𝑥 ,…,𝑥 ⎢ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ ⋯ ⎦
If not one‐to‐one:
𝑓 𝑦 ,…,𝑦 𝑓 𝑥 ,…,𝑥 𝐽
Order statistics
𝑛! 𝑓 𝑥
𝑓 , , 𝑦 ,𝑦 ,𝑦 𝑓 , , 𝑥 , 𝑥 , 𝑥 |𝐽 |
𝑓 𝑥 𝑓 𝑥 𝑓 𝑥 |𝐽 |
𝑓 𝑦 𝑓 𝑦 𝑓 𝑦 |𝐽 |
𝜕𝑥 𝜕𝑥 𝜕𝑥
⎡ ⎤
⎢𝜕𝑦 𝜕𝑦 𝜕𝑦 ⎥
⎢𝜕𝑥 𝜕𝑥 𝜕𝑥 ⎥ 0 0 1
|𝐽| ⎢𝜕𝑦 ⎥ 1 0 0 | 1| 1
𝜕𝑦 𝜕𝑦
⎢ ⎥ 0 1 0
⎢𝜕𝑥 𝜕𝑥 𝜕𝑥 ⎥
⎣𝜕𝑦 𝜕𝑦 𝜕𝑦 ⎦
𝑛! 𝑓 𝑦 𝑓 𝑦 𝑓 𝑦
In general, the pdf of the order statistic for n iid(independently identically distributed) r.v.’s is:
𝑓 ,…, 𝑦 ,…,𝑦 𝑛! 𝑓 𝑦
Thm 0.18: If xi’s are iid sample from a population with continuous pdf f(x), then the joint pdf of
the order statistics 𝑌 , … , 𝑌 is
𝑔 𝑦 ,…,𝑦 𝑛! 𝑓 𝑦 ⋯ 𝑓 𝑦
For 𝑦 𝑦 ⋯ 𝑦 and zero otherwise.
***What is 𝑔 𝑦 ?
𝑛!
𝑔 𝑦 𝐹 𝑦 1 𝐹 𝑦 𝑓 𝑦
𝑘 1 ! 𝑛 𝑘 !
***How about the CDF?
Let’s focus on min & max
𝐺 𝑦 𝑃𝑌 𝑦
1 𝑃𝑌 𝑦
1 𝑃 min 𝑥 , … , 𝑥 𝑦
1 𝑃𝑥 𝑦 ,𝑥 𝑦 ,…,𝑥 𝑦
1 𝑃𝑥 𝑦 𝑃𝑥 𝑦 ⋯𝑃 𝑥 𝑦
1 1 𝐹 𝑦 1 𝐹 𝑦 ⋯ 1 𝐹 𝑦
1 1 𝐹 𝑦
CDF of Yn (max):
𝐹 𝑦 𝑃𝑌 𝑦
𝑃 max 𝑥 , … 𝑥 𝑦
𝑃𝑥 𝑦 ,𝑥 𝑦 ,…,𝑥 𝑦
𝑃𝑥 𝑦 𝑃𝑥 𝑦 ⋯𝑃 𝑥 𝑦
𝐹 𝑦 ∙𝐹 𝑦 ⋯𝐹 𝑦
𝐹 𝑦
Chapter 1. Sampling Distribution
Probability vs Statistics:
Def. 1.1: A function of observable r.v.’s, 𝑇 𝑡 𝑥 , … , 𝑥 , which does not depend on any unknown
parameters is called Statistics.
ᵢ
Examples: 𝑥 ∑ 𝑇 ∑ 𝑛 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛
ℴ
𝑥 ∑ 𝑖𝑓 ℴ 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛, 𝑛𝑜𝑡 𝑎 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐
𝑇 𝑥: 𝑥 :
Def 1.2: The set of r.v.’s 𝑥 , … , 𝑥 is said to be a random sample (rs) of size n from a population with
density function ƒ(x) if the joint pdf has the form:
𝑓 , . . ., 𝑥 , . . . , 𝑥 𝑓 𝑥 ∗ … ∗ 𝑓 𝑥
Ex. 1.1: Let 𝑥 , . . . , 𝑥 be a rs of size 𝑛 2𝑘 from 𝐸𝑋𝑃 𝜃 . Find 𝑓 , . . ., 𝑥 , . . . , 𝑥 .
Solution: ∴ 𝑓 𝑥 𝑒 , ∀𝑖 𝑎𝑛𝑑 𝑥 𝑠 𝑎𝑟𝑒 𝑟𝑠
∑
∴ 𝑓 ,… , 𝑥 ,…,𝑥 𝑓 𝑥 ∗. . .∗ 𝑓 𝑥 𝑛𝑒 , ∀𝑖 0 𝑥 #
Ex. 1.2: Let 𝑥 , . . . , 𝑥 be a rs of size 𝑛 2𝑘 taken from 𝑈 0,1 .
Find 𝑃 𝑥 , 𝑥 , 𝑥 ,𝑥 ,...,𝑥 .
Solution: ∴ 𝑥 𝑠 𝑎𝑟𝑒 𝑟𝑠
∴ 𝑃 𝑥 , 𝑥 , 𝑥 ,𝑥 ,...,𝑥 𝑃 𝑥 𝑃 𝑥 …𝑃 𝑥
1 1 1
∗ ∗. . . ∗
2 2 2
2
4 #
∑
Def 1.3: The sample mean is a r.v. and defined as 𝑥 . 𝑁𝑜𝑡𝑒: 𝐸 𝑥 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑟𝑠.
Thm 1.1: If 𝑥 , . . . , 𝑥 is a rs from 𝑓 𝑥 with 𝐸 𝑥 𝜇 and 𝑉𝑎𝑟 𝑥 𝜎 then 𝐸 𝑥 𝜇 and 𝑉𝑎𝑟 𝑥
.
Proof: 1. 𝐸𝑥 𝐸 ∑ (Substitute definition of 𝑥 )
∑
2. Since 𝑓 𝑥 𝑑𝑥 ∑ 𝑓 𝑥 𝑑𝑥 , 𝐸 ∑ ∑ even if xᵢ’s are not ⊥.
∑
3. ∑ (Substitute definition of 𝐸 𝑥 )
∑
4. 𝜇 (Definition of summation) #
∑
Proof Summary: 𝐸𝑥 𝐸 ∑ ∑ 𝜇 , even if xᵢ’s are not independent.
FOLLOWING SIMILAR LOGIC:
𝑉𝑎𝑟 𝑥̅ 𝑉𝑎𝑟 ∑
𝑉𝑎𝑟 ∑ 𝑥 𝑁𝑜𝑡𝑒: 𝑂𝑁𝐿𝑌 𝑤𝑜𝑟𝑘𝑠 𝑖𝑓 𝑥 𝑠 𝑎𝑟𝑒 ⊥ ∑ 𝑉𝑎𝑟 𝑥
∑ 𝜎 #
Def 1.4: An estimator T is said to be an unbiased estimator of 𝜏 𝜃 if
∀𝜃 ∈ Ω, 𝐸 𝑇 𝜏 𝜃 .
Examples: 𝜎 , ln 𝜎 , 𝑒 , ln 𝜎 𝑒
How to estimate 𝜎 ? ∑ → 𝜇
∑
𝑁𝑜𝑡 𝑎 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 → → 𝜎 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝐸 𝑥 𝜇 𝜎
∑ ̅
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 → → 𝜎
∑ ̅
********ADD EXPLANATION 𝐼𝑠 𝐸 𝜎
1
𝑛 1 𝜎
𝑛 1
𝜎
IF 𝑥 ’s are not iid
(ex: if there is a correlation between 𝑥 𝑎𝑛𝑑 𝑥 )
𝑥
𝑉𝑎𝑟 𝑥̅ 𝑉𝑎𝑟
𝑛
1
𝑉𝑎𝑟 𝑥
𝑛
1
𝑐𝑜𝑟 𝑥 , 𝑥
𝑛
1
𝑐𝑜𝑟 𝑥 , 𝑥
𝑛
1
𝑛𝜎 2 𝑛 𝑗 𝑐
𝑛
𝑆𝑢𝑏𝑠𝑡𝑒𝑝: 𝑐 𝑐𝑜𝑟 𝑥 , 𝑥 𝜌𝜎
1
𝑛𝜎 2 𝑛 𝑗 𝜎 𝜌
𝑛
𝜎 𝑗
1 2 1 𝜌
𝑛 𝑛
𝜎
𝑖𝑓 𝜌 0 𝐵𝑖𝑔𝑔𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑛𝑜𝑡 𝑔𝑜𝑜𝑑
𝑛
𝜎
𝑖𝑓 𝜌 0 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑔𝑜𝑜𝑑
𝑛
𝜎
𝑛
𝑗
1 2∑ 1 𝜌
𝑛
𝜎
→ 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
𝑛
⎧ 𝐸 𝑥̅ 𝑀
⎪ 𝜇 → 𝑥̅ ∑ ̅
SUMMARY: 𝑉𝑎𝑟 𝑥̅ →
⎨ ∑ ̅
⎪𝜎 → 𝑆
⎩
Notes: ~ → "𝑓𝑜𝑙𝑙𝑜𝑤𝑠" , → 𝑚𝑒𝑎𝑛𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑖𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
Def 1.5: Normal Distribution: 𝐼𝑓 𝑓 𝑥 𝑒 , ∞ 𝑥 ∞ 𝑡ℎ𝑒𝑛 𝑋~𝑁 𝜇, 𝜎 𝑤ℎ𝑒𝑟𝑒 ∞
√
𝜇 ∞ 𝑎𝑛𝑑 𝜎 0
1
𝑎𝑛𝑑 𝑖𝑓 𝜇 0 𝑎𝑛𝑑 𝜎 1 𝑡ℎ𝑒𝑛 𝑥 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑎 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙 𝑟. 𝑣. 𝑤𝑖𝑡ℎ 𝑓 𝑥 𝑒
√2𝜋
𝑒 𝑑𝑥 √2𝜋
Thm 8.3: 𝐶𝑒𝑛𝑡𝑟𝑎𝑙 𝐿𝑖𝑚𝑖𝑡 𝑇ℎ𝑒𝑜𝑟𝑒𝑚 𝐶𝐿𝑇 : 𝐼𝑓 𝑥 , . . . , 𝑥 is a rs from a distribution with mean 𝜇 and
variance 𝜎 ∞, then the limiting distribution of
∑
∑ ̅
𝑍 𝑍
√
⎡ ⎤
𝑥̅ 60 𝑐 60
∴ 𝑃 𝑥̅ 𝑐 1 𝑃 𝑥̅ 𝑐 1 𝑃⎢ ⎥
⎢ 36 36 ⎥
⎣ 25 25 ⎦
60 𝑐
1 Φ 0.95
6⁄5
𝑐 60
⟹Φ 0.05
6⁄5
⇒∴ ⁄
𝑍 . → 𝑁𝑜𝑤 𝑙𝑜𝑜𝑘 𝑎𝑡 𝑡𝑎𝑏𝑙𝑒 𝑡𝑜 𝑓𝑖𝑛𝑑 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑛𝑑 𝑢𝑠𝑒 𝑙𝑖𝑛𝑒𝑎𝑟 𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑡𝑜 𝑔𝑒𝑡 𝑍 .
𝑍. 1.65
∴
𝑍. 1.64
𝑔𝑒𝑡 𝑍 . 1.645 𝑍. 1.645
6
∴ 𝐶 60 𝑍 . 58.026 #
5
Ex 1.4: Suppose𝑍~𝑁 0,1 ,
❶ 𝑋 ~Γ ,2
2𝑥
❷ 𝐼𝑓 𝑥~Γ 𝛼, 𝛽 , 𝑡ℎ𝑒𝑛 𝐹 𝑥 𝐻
, 2𝛼 𝑤ℎ𝑒𝑟𝑒 𝐹 𝑖𝑠 𝑡ℎ𝑒 𝐶𝐷𝐹 𝑜𝑓 Γ 𝑎𝑛𝑑 𝐻 𝑖𝑠 𝐶𝐷𝐹 𝑜𝑓 𝑋
𝛽
1 𝑋 ∑ 𝑒 , 𝑤ℎ𝑒𝑟𝑒 𝑒 ~𝑁 0,1
❸ 𝑊ℎ𝑦 𝑙𝑒𝑎𝑟𝑛 𝑋 ? ̅
2 𝑟𝑒𝑙𝑎𝑡𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑆 ∑
Thm 1.5: If 𝑥 , . . . , 𝑥 is a r.s. from 𝑁 𝜇, 𝜎 , then
❶ 𝑥̅ 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚𝑠 𝑥 𝑥̅ , 𝑖 1, . . , 𝑛 𝑎𝑟𝑒 ⊥
❷ 𝑥 𝑎𝑛𝑑 𝑆 𝑎𝑟𝑒 ⊥
❸ ~𝑋 𝑛 1 ←𝑛 1 𝑑𝑜𝑓 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
Proof:
𝑦 𝑥̅
1) Use transformation: ⋮ ⋮ →𝑓 ,…, 𝑓 ∗𝑓 ,…,
𝑦 𝑥 𝑥̅
∑ ̅
2) 𝑆 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥 𝑥̅ , 𝑖 1, … , 𝑛 𝑥̅ ⊥ 𝑆
∑ ∑ ̅ ̅ ∑ ̅ ∑ ̅ ̅ ∑ ̅
3) 𝑣
𝑥 𝑥̅ 𝑛 𝑥̅ 𝜇 ∑
𝜎 𝜎
𝑥 𝜇 𝑥 𝜇 𝑥 𝜇
~𝑁 0,1 → ~𝑋 1 → ~𝑋 𝑛
𝜎 𝜎 𝜎
𝑥̅ 𝜇 𝑥̅ 𝜇
𝜎 ~𝑁 0,1 → 𝜎 ~𝑋 1
√𝑛 √𝑛
𝑣 𝑣 𝑣
𝑀 𝑡 𝑀 𝑡 ∗𝑀 𝑡
𝑀 𝑡 1 2𝑡
𝑀 𝑡 1 2𝑡
𝑀 𝑡
1 2𝑡
∴ 𝑣 ~𝑋 𝑛 1 #
We can compute the percentile of 𝑆 by using the Chi‐Square table.
i.e., 𝛾 𝑃 𝑋 , 𝑃 𝑆 𝑋 , 𝑤ℎ𝑒𝑟𝑒 𝑖𝑓 𝑋 , ,𝑦 𝑑𝑜𝑓 𝑎𝑛𝑑 𝑧 𝑝𝑒𝑟𝑐𝑒𝑛𝑡
100 𝛾 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑆 , 𝑠𝑎𝑦 𝑐 , 𝑖𝑠:
𝜎 𝑋 ,
𝑐
𝑛 1
𝑎 𝐹𝑖𝑛𝑑 𝑃 𝑍 3.84 𝑢𝑠𝑖𝑛𝑔 𝑁𝑜𝑟𝑚𝑎𝑙 𝑇𝑎𝑏𝑙𝑒
Ex. 1.5: Let 𝑍~𝑁 0,1 ,
𝑏 𝐹𝑖𝑛𝑑 𝑃 𝑍 3.84 𝑢𝑠𝑖𝑛𝑔 𝐶ℎ𝑖 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑎𝑏𝑙𝑒
Sol: a) 𝑃 𝑍 3.84 𝑃 √3.84 𝑍 √3.84 𝑃 |𝑍| √3.84
𝑃 |𝑍| 1.96
∝
From Ex.1.4 𝑍 ∝ 1.96 ⇒ 1 0.975
⇒ ∝ 0.05
⇒ 𝑃𝑍 3.84 1 ∝ 1 0.05 0.95
b) 𝐹𝑖𝑟𝑠𝑡, 𝑛𝑜𝑡𝑒: 𝑍~𝑁 0,1 → 𝑍 ~𝑋 1 𝑎𝑛𝑑 𝑡ℎ𝑢𝑠 𝛾 1
𝑃 𝑍 3.84 𝑃𝑋 1 3.84 0.95 #
Snedecor’s F Distribution: developed by George W. Snedecor, “F” is to commemorate Sir Ronald Fisher
𝑣 ~𝑋 𝛾
Thm. 1.6: If ⊥ , then 𝐹 , ℎ𝑎𝑠 𝑡ℎ𝑒 𝑓𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑝𝑑𝑓:
𝑣 ~𝑋 𝛾
𝛾 𝛾
Γ 𝛾 𝛾
𝑔 𝑥; 𝛾 , 𝛾 2 𝑥 1 𝑥
γ γ 𝛾 𝛾
Γ Γ
2 2
Why F?
𝑣 ~𝑋 𝛾 ⟹ ~𝑋
Answer: 𝑊ℎ𝑒𝑛 𝑤𝑒 𝑐𝑜𝑚𝑝𝑎𝑟𝑒 𝜎 𝑎𝑛𝑑 𝜎 :
𝑣 ~𝑋 𝛾 ⟹ ~𝑋 𝑛 1
𝑛 1 𝑆
𝜎
𝑛 1 𝑆 𝑛 1 𝑆 𝜎
𝐹 , ~ ∗ ∗
𝜎 𝑛 1 𝑆 𝜎
Percentile: 𝑷 𝑿 𝒇𝜸 𝜸𝟏 , 𝜸𝟐 𝜸
For small value of 𝛾, e.g., 𝛾 0.01, we can use the fact that if 𝑋~𝐹 𝛾 , 𝛾 , then
𝑌 ~𝐹 𝛾 , 𝛾 .
Then: 𝛾 𝑃𝑋 𝑓 𝛾 ,𝛾 𝑃
,
1
𝑃 𝑌
𝑓 𝛾 ,𝛾
1
1 𝑃 𝑌
𝑓 𝛾 ,𝛾
1
⟹ 𝑃 𝑌 1 𝛾
𝑓 𝛾 ,𝛾
1
⟹ 𝑓 𝛾 ,𝛾
𝑓 𝛾 ,𝛾
Ex.1.6: 𝛾 0.01, 𝛾 3, 𝛾 5 𝐹𝑖𝑛𝑑 𝑓 . 3,5 ? 1 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
Sol: 𝑓. , 0.0355
. , . , .
Thm. 1.7: If 𝑋~𝐹 𝛾 , 𝛾 , 𝑡ℎ𝑒𝑛
𝛾
𝐸𝑋 , 𝛾 2
𝛾 2
2𝛾 𝛾 𝛾 2
𝑉𝑎𝑟 𝑋 , 𝛾 4
𝛾 𝛾 2 𝛾 4
(𝑋 𝑣 𝑣 𝐸𝑋 𝐸𝑣 𝐸𝑣 )
Ex. 1.7: Two vendors make the same product. We take independent r.s.’s of size 31 from each and
measure the length of the products (assumed normal). If the length is equally variable for the two
vendors, what is 𝑃 1.5 ? 𝑁𝑜𝑡𝑒: 𝜎 𝜎
Sol: We know 𝑛 𝑛 ⟹ ∗ ~𝐹 30,30
𝑆 𝑆
𝐼𝑓 𝜎 𝜎 , 𝑡ℎ𝑒𝑛 𝑃 1.5 ∴ 0
𝑆 𝑆
𝑃 1.5 𝑃𝐹 , 2.25 0.015 (Found by web calculator)
1 0.985 #
Student’s t Distribution: by William Gosset
Why?
Answer:
𝑥̅ 𝜇
𝜎 ~ 𝑁 0,1
√𝑛
𝑥̅ 𝜇
𝑆 ~𝑡 𝑛 1
√𝑛
Thm 1.8: If𝑍~𝑁 0,1 ⊥ 𝑉~𝑋 𝛾 , 𝑡ℎ𝑒𝑛 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑇
̅
Thm.1.9: If 𝑥 , … , 𝑥 is a r.s. from 𝑁 𝜇, 𝜎 , then 𝑇 ~𝑡 𝑛 1
√
~ ,
̅ ̅ ̅
Proof: ∴ 𝑇 ∗ ~𝑡 𝑛 1 #
. .
√ ∗ ~
𝑑
𝑡 𝑛 ⎯⎯ 𝑁 0,1
→
1 𝑋 ~𝑁 𝜇, 𝜎 , 𝑛 𝑠𝑚𝑎𝑙𝑙 𝑒𝑔. 𝑛 5 ⟹ 𝑒𝑥𝑎𝑐𝑡𝑙𝑦 𝑁 0,1 ⇐ 𝑋
2 𝑋 ≁ 𝑁 𝜇, 𝜎 , 𝑛 𝑠𝑚𝑎𝑙𝑙 𝑒𝑔. 𝑛 10 ⟹ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑙𝑦 𝑡 𝑛 1
3 𝑋 ≁ 𝑁 𝜇, 𝜎 , 𝑛 𝑙𝑎𝑟𝑔𝑒 𝑒𝑔. 𝑛 20 ⟹ 𝑎𝑠𝑦𝑚𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑁 0,1
Thm 1.10: The t distribution with 1 degree of freedom ( t(1) ) is the Cauchy distribution (pdf: 𝑓 𝑦
, ∞ 𝑦 ∞ )
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Proof: 𝑓 𝑡; 𝛾 ∗ 1 Γ 1 1, Γ √𝜋 𝑓 𝑦 #
√
𝑦≡𝑡
𝐸𝑦 ∞
𝑊ℎ𝑒𝑛 𝑦 𝑖𝑠 𝐶𝑎𝑢𝑐ℎ𝑦 ∶
𝑉𝑎𝑟 𝑦 ∞
𝑢~𝑁 0,1 𝑢
𝐼𝑓 , 𝑡ℎ𝑒𝑛 ~𝑡 1 𝐶𝑎𝑢𝑐ℎ𝑦
𝑣~𝑁 0,1 𝑣
Beta Distribution: developed by Karl Pearson
1 𝐹𝑙𝑒𝑥𝑖𝑏𝑙𝑒
Why learn Beta?
2 𝑅𝑒𝑙𝑎𝑡𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑟𝑑𝑒𝑟 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠
If 𝑋~𝐹 , , 𝑡ℎ𝑒𝑛 𝑌 ℎ𝑎𝑠 𝑡ℎ𝑒 𝑝𝑑𝑓: 𝑓 𝑦; 𝑎, 𝑏 𝑦 1 𝑦 ,0 𝑦 1 where 𝑎
,𝑏 . This is called the Beta distribution with parameters a>0 and b>0, denoted by
𝒀~𝑩𝑬𝑻𝑨 𝒂, 𝒃 .
The F distribution can be expressed by Beta:
𝛾 𝛾
𝑥 𝛾 𝛾 𝛾 𝑌
𝛾 𝛾
𝑌 𝛾 ⟹ 𝑌 𝑥𝑌 𝑥 ⟹ 𝑌 1 𝑌 𝑥 ⟹ 𝑥
1 𝑥 𝛾 𝛾 𝛾 1 𝑌
𝛾
The percentile of Beta can be computed by using the percentile of F: 𝛾 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝐵𝐸𝑇𝐴 𝑎, 𝑏
𝑎
𝑓 𝛾 ,𝛾 𝑎𝑓 2𝑎, 2𝑏
𝑦 𝑏
𝑎 𝑏 𝑎𝑓 2𝑎, 2𝑏
1 𝑓 𝛾 ,𝛾
𝑏
The Beta distribution is related to the order statistics:
𝑇ℎ𝑒 𝑝𝑑𝑓 𝑜𝑓 𝐾 𝑜𝑟𝑑𝑒𝑟 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐, 𝑥 : :
𝑛!
𝑔 𝑥 : 𝐹 𝑥 : 1 𝐹 𝑥 : 𝑓 𝑥 :
𝑘 1 ! 𝑛 𝑘 !
Γ 𝑎 𝑏
𝑓 𝑦; 𝑎, 𝑏 𝑦 1 𝑦
Γ 𝑎 Γ 𝑏
Chapter 2: Point Estimation
Def 2.1: A statistic 𝑇 𝑡 𝑥 , … , 𝑥 used to estimate the value of 𝜏 𝜃 is called an estimator of 𝜏 𝜃 ,
and an observed value of statistic, 𝔱 𝑡 𝑥 , … , 𝑥 is call an estimate of 𝜏 𝜃 .
𝑇 ⟶ 𝑟. 𝑣.
𝑡 ⟶ 𝑘𝑛𝑜𝑤𝑛 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑
Examples of 𝜏 𝜃 → 𝜃 , ln 𝜃 , 𝑒 , 𝑒𝑡𝑐.
Four methods for estimating parameters:
1) Method of Moment Estimators (MME)
2) Method of Maximum Likelihood (MLE)
3) Minimax Estimator
4) Bayes Estimator
Toss a coin 3 times:
1 𝑖𝑓 ℎ𝑒𝑎𝑑𝑠
Let 𝑥 𝑒. 𝑔. , 𝑥 , 𝑥 , 𝑥 ⇒ 1,0,1
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
How to estimate p = p(x=1) = p(head).
Write down the pdf:
1) 𝐿 𝑓 𝑥 ;𝑝 𝑓 𝑥 ;𝑝 𝑓 𝑥 ;𝑝 𝑝 1 𝑝
Want to find a P such that the above probability is maximized:
2) 𝐿 ∑ 𝑥 𝑝∑ 1 𝑝 ∑
3 ∑𝑥 1 𝑝 ∑
𝑝∑
𝑥 1 𝑝 3 𝑥 𝑝 𝑝∑ 1 𝑝 ∑
0
⇒ ∑𝑥 1 𝑝 3 ∑𝑥 𝑝 0
⇒ ∑𝑥 ∑𝑥 𝑝 3𝑝 ∑𝑥 𝑝 0
∑
⇒ 𝑝∗ 𝑥̅
MLE for P is 𝑃 𝑥̅
Maximum Likelihood Estimator
The maximum likelihood principle:
Choose 𝜃 for a given observed set of data such that the observed data would have been most likely
to occur.
Solving for the Maximum likelihood estimator:
1) Write down 𝐿 𝜃 : aka the likelihood function
2) Find 𝜃 that maximizes 𝐿 𝜃 : two steps
a. Take ln ∙ of 𝐿 𝜃 (reasoning explained below)
b. Take partial derivative in terms of 𝜃 and set to 0, solve for 𝜃
3) Check that 𝜃 is a maximizer, not a minimizer
a. Take second partial derivative of ln 𝐿 𝜃
𝑒 𝜃
∴ 𝑓 𝑥 , 𝑖 1, … , 𝑛
𝑥!
𝑒 𝜃 𝑒 𝜃∑
∴ 𝐿 𝜃 𝑓 𝑥
𝑥! ∏ 𝑥!
STEP 2: Find 𝜃 that maximizes 𝐿 𝜃
ln 𝐿 𝜃 𝑛𝜃 𝑥 ln 𝜃 ln 𝑥!
𝜕 ∑𝑥
ln 𝐿 𝜃 𝑛 0
𝜕𝜃 𝜃
∑𝑥
𝜃 𝑥̅
𝑛
STEP 3: Verify that 𝜃 is a maximizer not a minimizer. Need to check ln 𝐿 𝜃 0?
∑
ln 𝐿 𝜃 0 0⟹ 𝜃 is a maximizer. #
Sol: STEP 1: ∴ 𝑓 𝑥 𝑒 , 𝜂 𝑥
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
,
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ 𝑓 𝑥 𝑒 , 𝑥 𝜂
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑒 , 𝑎𝑙𝑙 𝑥 𝑠 𝜂
∴ 𝐿 𝜂 𝑓 𝑥
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑒 ∑ , 𝑎𝑙𝑙 𝑥 𝑠 𝜂
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
STEP 2: ln 𝐿 𝜂 ∑ 𝑥 𝜂 , 𝑎𝑙𝑙 𝑥 𝑠 𝜂
𝜕 ln 𝐿 𝜂
𝑥 𝜂 0 𝑛 0
𝜕𝜂
∴ 𝜂̂ 𝑥 :
STEP 3: (Trivial from STEP 2)
A poll was conducted at the Univ. of West Florida, 356 upper‐classmen were asked the question:
x=“How many times have you switched major?”
0 237 230.4
1 90 100.2
2 22 21.8
Sol: From Ex. 2.1, we 3 7 3.8
know
∑𝑥
𝜃 𝑥̅ 250
𝑛
200
Observed Frequency
150
100
Actual Frequency
0.435 50
𝑒 ∙𝜃 0 1 2 3
𝑃 𝑥 0 𝑒 0.65
𝑥!
Invariance Property
In Ex. 2.1, suppose we want to estimate 𝜏 𝜃 𝑝𝑥 0 𝑒 using MLE, can we use 𝜏̂ 𝑒 ?
STEP 1: 𝑓 𝑥; 𝜃 → 𝑓 𝑥; 𝑒
! !
𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝜏 𝑒 𝑎𝑛𝑑 𝜃 ln 𝜏
𝜏 ln 𝜏 𝜏 ln 𝜏 ∑
𝐿 𝜏 𝑓 𝑥 ;𝜏
𝑥! ∏ 𝑥!
ln 𝐿 𝜏 𝑛 ln 𝜏 𝑥 ln ln 𝜏 ln 𝑥!
∑
STEP 2: ∙ 0
∑𝑥 ∑
⇒ ln 𝜏 ⇒ 𝜏̂ 𝑒 𝑒
𝑛
𝐴𝑛𝑠𝑤𝑒𝑟 𝑦𝑒𝑠.
Substitution: 𝛽 𝜇 ⇒ 𝑃 𝑥; 𝛽 ⇒ 𝑃𝑂𝐼 𝛽
!
∑
Sol: STEP 1: 𝐿 𝜇 ∏ 𝑃 𝑥 ;𝜇 ∏
! ∏ !
ln 𝐿 𝜇 2 𝑥 ln 𝜇 𝑛𝜇 ln 𝑥!
∑ ∑
STEP 2: 2𝑛𝜇 0 𝑛𝜇
∑𝑥 ∑𝑥
⇒ 𝜇̂ 𝜇̂ 𝑑𝑖𝑠𝑐𝑎𝑟𝑑𝑒𝑑
𝑛 𝑛
√𝑥̅ 𝜃
∑
STEP 3: ln 𝐿 𝜇 𝑛 0 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝑟
If you realized the POI substitution, then 𝜇̂ 𝜃 √𝑥̅ #
Sol: STEP 1: Write down 𝐿 𝜃 :
𝐿 𝜃 𝑓 𝑥 : ,𝑥 : ,…,𝑥 : ; 𝜃 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑥 𝑒 𝑎𝑛𝑑 𝐹 𝑥 1 𝑒
! : :
∏ 𝑒 𝑒
! x x x x x
1: n 2: n . . . r: n r+1: n . . . n: n
∑ : :
!
∙ 𝑟𝑒 ∙𝑒
!
n
𝑛! ∑ 𝑥: 𝑛 𝑟 𝑥 :
𝑟 exp This is what
𝑛 𝑟 !𝜃 𝜃 r+
we observe
r
.
𝐷𝑒𝑓𝑖𝑛𝑒 𝑇 ∑ 𝑥: 𝑛 𝑟 𝑥 : .
.
2
1
STO x
⇒ 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑟𝑣𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑛 𝑖𝑡𝑒𝑚𝑠 𝑢𝑛𝑡𝑖𝑙 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑖𝑠 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑
STEP 2: Find 𝜃
𝑇
ln 𝐿 𝜃 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑟 ln 𝜃
𝜃
𝜕 ln 𝐿 𝜃 𝑟 𝑇
0
𝜕𝜃 𝜃 𝜃
𝑇
⇒ 𝜃
𝑟
STEP 3: Verify:
0 #
Sol: ∴ 𝑓 𝑥; 𝜇, 𝜎 𝑒
√
1 ∑
∴ 𝐿 𝜇, 𝜎 𝑒 2𝜋𝜎 𝑒
√2𝜋𝜎
𝑛 ∑ 𝑥 𝜇
ln 𝐿 𝜇, 𝜎 ln 2𝜋𝜎
2 2𝜎
⎧ 𝜕 ∑ 𝑥 𝜇 ⇒ 𝑥 𝜇 0
⎪ ln 𝐿 𝜇, 𝜎 0 0 ∑𝑥
⎪ 𝜕𝜇 𝜎
⇒ 𝜇̂ 𝑥̅
⎪ 𝑛
⎪ 𝑥 𝜇
⎪ ⎧ ⇒ 𝑛
⎪ ⎪ 𝜎
⎪ 1
⎪ ⇒𝜎 𝑥 𝜇̂
⎨ ⎪ 𝑛
⎪ 𝜕 ln 𝐿 𝜇, 𝜎 𝑛 1 ∑ 𝑥 𝜇 1
0 ⇒ 𝜎 𝑥 𝑥̅
⎪𝜕𝜎 2𝜎 2𝜎 ⎨ 𝑛
⎪ ⎪ 𝑛 1∑ 𝑥 𝑥̅
⎪ ⎪
⎪ ⎪ 𝑛 𝑛 1
⎪ ⎪ 𝑛 1
⎩ ⎩ 𝑆
𝑛
𝜇̂ 𝑥
∴ ∴ 𝑆 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 #
𝜎 𝑆
∴𝜎 𝑖𝑠 𝑏𝑖𝑎𝑠𝑒𝑑
Ex 2.7: RS xi’s ~𝑓 𝑥; 𝜃, 𝜂 𝜃𝜂 𝑥 ; 𝜂 𝑥, 0 𝜃, 0 𝜂 ∞.
Find 𝜃 and 𝜂̂ .
Sol: STEP 1:
𝐿 𝜃, 𝜂 𝑓 𝑥 ; 𝜃, 𝜂 𝜃𝜂 𝑥 𝜃 𝜂 𝑥
ln 𝐿 𝜃, 𝜂 𝑛 ln 𝜃 𝑛𝜃 ln 𝜂 𝜃 1 ln 𝑥
𝑛 ln 𝜃 𝑛𝜃 ln 𝜂 𝜃 1 ln 𝑥
,
𝑛 ln 𝜂 ∑ ln 𝑥 0
STEP 2: ,
0
𝑛 𝑛
𝜃 𝑥 𝜃 𝑥
∑ ln ⇒ ∑ ln
𝜂 𝑥:
𝜂̂ 𝑥: 𝜂̂ 𝑥 :
STEP 3: Verify.. #
****Find MLE for 𝛼 percentile?
⇒𝑥 𝜃 ln 1 𝛼 𝜂
∴𝐹 𝑥 𝛼 1 𝑒
⇒ 𝑢𝑠𝑒 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦.
Methods of Moment Estimators: by Karl Pearson
Methods of Moments Estimator: Suppose x is a continuous rv and its pdf, 𝑓 𝑥; 𝜃 , 𝜃 , … , 𝜃 has k
unknown parameters. The jth population(theoretical) moment about the origin is:𝜇 𝐸𝑥
𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥, 𝑗 1, … , 𝑘 The jth population moment about the mean is
𝜇 𝐸 𝑥 𝐸𝑥 𝐸 𝑥 𝜇
∑𝑥
𝜇 𝐸𝑥 𝑥𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥
𝑛
∑𝑥
𝜇 𝐸𝑥 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥
𝑛
⋮
∑𝑥
𝜇 𝐸𝑥 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥
𝑛
∑
If x is discrete: ∑∀ 𝑥 𝑃 𝑥 𝑋; 𝜃 , … , 𝜃 ,𝑗 1, … , 𝑘
Def 2.5: The first k sample moments are:
∑𝑥
𝑀 , 𝑗 1,2, … , 𝑘
𝑛
∑𝑥
𝜇 𝐸𝑋 𝑥𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥 𝑀
𝑛
∑𝑥
𝜇 𝐸𝑋 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥 𝑀
𝑛
⋮
∑𝑥
𝜇 𝐸𝑥 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥 𝑀
𝑛
If x is discrete, then the equations become:
∑𝑥
𝑥 𝑃 𝑋 𝑥; 𝜃 , … , 𝜃 , 𝑗 1, … , 𝑘
𝑛
∀
Find 𝜃 .
Theoretical Moment Sample Moment
𝐸𝑋 0∙𝑃 1∙𝑃
0∙ 1 𝜃 1∙ 𝜃
∑ 𝑥 1 0 1 1 0 3
𝜃 0.6
𝑛 5 5
∴ 𝜃 0.6 𝑥̅ #
Ex 2.9: If 𝑥 𝑠~𝐸𝑋𝑃 1, 𝜂 , 𝑓 𝑥; 𝜃, 𝜂 𝑒 , 𝜂 𝑥
Find 𝜂̂ ? ⇒ 𝑓 𝑥; 1, 𝜂 𝑒
Sol:
∑𝑥
𝐸𝑋 1 𝜂
𝑛
⇒ 𝜂̂ 𝑥̅ 1
𝑊ℎ𝑖𝑐ℎ 𝑜𝑛𝑒 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟? 𝑆𝑒𝑒 𝑛𝑒𝑥𝑡 𝑡𝑜𝑝𝑖𝑐
𝜂̂ 𝑥:
Sol:
𝜇̂ 𝑥̅
1 𝐸𝑥 𝜇 𝑥̅ ⎧
∑𝑥 𝑛 1
∑𝑥 ⇒ 𝜎 ⏟ 𝑥̅ 𝑆
2 𝐸𝑥 𝜎 𝜇 ⎨ 𝑛 𝑛
𝑛 ⎩
∑ ̅
𝑘 1
Ex 2.11: Let 𝑋~𝑃 𝑘; 𝑟, 𝑝 𝑝 1 𝑝 ,𝑘 𝑟, 𝑟 1, ⋯.
𝑟 1
Find 𝑟̂ and 𝑝̂ ? ↖ 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
1) 𝐸 𝑋 ⏟ 𝐸∑ 𝑌 ⏟ 𝑥̅
~
2) 𝐸 𝑋 𝑉𝑎𝑟 𝑋 𝐸𝑋
𝑟
𝑉𝑎𝑟 𝑌 ↓𝑌 𝑠 ⊥
𝑝
∑ 𝑉𝑎𝑟 𝑌 ↓ 𝑉𝑎𝑟 𝑌
∑
∑
Solve for r and p :
𝑥̅
⎧r 𝑛 1
⎪ 𝑆 𝑥̅
𝑛
⇒ #
⎨ 𝑥̅
⎪p 𝑛 1
⎩ 𝑆 𝑥̅
𝑛
Criterion for Evaluating Estimators:
Def 2.7: An estimator T is said to be an unbiased estimator of 𝜏 𝜃 if
∀𝜃 ∈ Ω, 𝐸 𝑇 𝜏 𝜃 . Otherwise, we say that T is a biased estimator of 𝜏 𝜃
Alternate Definition: A point estimator 𝜃 is called an unbiased estimator of the parameter 𝜃 if 𝐸 𝜃 𝜃
for all possible values of 𝜃 . Otherwise 𝜃 is said to be biased. Furthermore the bias of is given by 𝐵
𝐸𝜃 𝜃. Note: bias occurs when the sample does not accurately represent the population which
contains the sample.
Thm 2.3: 𝐸 𝑥̅ 𝜇 and 𝐸 𝑆 𝜎
Ex. 2.12: Let xi’s be a rs from 𝑈 0, 𝜃 . Find an unbiased estimator for 𝜃 based on x1:n.
Sol: From review lecture, we know
𝑥
𝐹 𝑥 1 1 𝐹 𝑥 ⏞ 1 1
:
𝜃
𝑑𝐹 :
𝑥 𝑥
∴𝑓 𝑥 1 1
:
𝑑𝑥 𝜃
𝑛 1
𝑛 𝑥
1
𝜃 𝜃
𝑛𝑥 𝑥
∴ 𝐸𝑥 : 𝑥𝑓 𝑥 𝑑𝑥 1 𝑑𝑥
:
𝜃 𝜃
𝑥 𝑥
𝑛𝑥 1 𝑑 1
𝜃 𝜃
𝑥
𝑥𝑑 1 𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑖𝑜𝑛 𝑏𝑦 𝑝𝑎𝑟𝑡𝑠
𝜃
𝑥 𝑛 𝑥
𝑥 1 𝑑𝑥 1
𝜃 0 𝜃
𝑥 𝑥
𝜃 1 𝑑 1
𝜃 𝜃
𝜃 𝑥 𝜃
1 |
𝑛 1 𝜃 0
𝜃
𝜃
𝑛 1
If 𝑇 𝑛 1 𝑥 : , then
𝜃
𝐸𝑇 𝐸 𝑛 1 𝑥 : 𝑛 1 ∙ 𝜃
𝑛 1
∴ 𝑇 is an unbiased estimator for 𝜃 #
Ex 2.13: 𝑅𝑆 𝑥 𝑠~𝐸𝑋𝑃 𝜃 , 𝜃 is mean. Find an unbiased estimator for rate .
Sol: ∴ 𝑇ℎ𝑒 𝑀𝐿𝐸 𝑓𝑜𝑟 𝜃 𝑖𝑠 𝑥̅ 𝑎𝑘𝑎 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜃 . If E T θ, E e e
∑𝑥 ∑𝐸 𝑥 ∑𝜃
𝐸 𝑥̅ 𝐸 𝜃
𝑛 𝑛 𝑛
1 1
𝑄: 𝐼𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 ?
𝑥̅ 𝜃
1
𝐸 ? ∴ 𝑥 ~𝐸𝑋𝑃 𝜃 ⇒ Γ 1, 𝜃
𝑥̅
2𝑥
∴ ~𝑋 𝐶𝑜𝑟𝑜𝑙𝑙𝑎𝑟𝑦 0.3
𝜃
2𝑥 2𝑛
𝑌 𝑥̅ ~𝑋 𝐶𝑜𝑟𝑜𝑙𝑙𝑎𝑟𝑦 0.4
𝜃 𝜃
1 2𝑛 Γ 𝑛 1
𝐸 𝐸𝑌 𝐸 𝑥̅ ⏞ 2
𝑌 𝜃 Γ 𝑛
1 1
2𝑛 1
𝜃 1 1 1 1 𝑛 1 1
∴ 𝐸 ⇒𝐸 ∙
2𝑛 𝑥̅ 2𝑛 1 𝑥̅ 𝑛 1 𝜃 𝜃
𝑛 1 1 1
∴ 𝑇 ∙ is unbiased for #
𝑛 𝑥̅ 𝜃
***Second Criterion => Variance of Estimator:
unbiased for θ
Sol: Try x1:n
𝜃
𝑥 : min 𝑥 , … , 𝑥 ~𝐸𝑋𝑃
𝑛
𝑃𝑥 : 𝑥 𝑃𝑥 𝑋, … , 𝑥 𝑋
𝜃
∴𝐸 𝑥 : 𝜃
𝑛
∴ 𝜃 𝑛𝑥 : is unbiased for θ.
Q: Which one ise better? That Var θ is greater than or less than Var θ ?
𝑉𝑎𝑟 𝑥 𝜃
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑥̅
𝑛 𝑛
𝜃
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑛𝑥 : 𝑛 𝑉𝑎𝑟 𝑥 : 𝑛 𝜃
𝑛
Sol:
1) 𝐸 𝜃 ∑𝐸 𝑥 ∙∑ 𝜃
𝑛 1
𝐸𝜃 𝐸𝑥 :
𝑛
𝑛 1 𝑛𝑥
𝑥∙ 𝑑𝑥
𝑛 𝜃
𝑛 1
𝑥 𝑑𝑥
𝜃
𝑛 1 1 𝜃
∙ 𝑥 |
𝜃 𝑛 1 0
𝜃
𝜃
𝜃
𝑁𝑜𝑡𝑒: 𝐹 :
𝑥 𝑃𝑥 : 𝑥 𝑃 max 𝑥 , … , 𝑥 𝑥 𝑃𝑥 𝑥, … , 𝑥 𝑥
~ , 𝑥
𝑃𝑥 𝑥 ∙ ⋯∙ 𝑃 𝑥 𝑥 𝐹 𝑥 ⋯𝐹 𝑥
𝜃
𝑛𝑥
⇒𝑓 𝑥
:
𝜃
Ex 2.16: During World War II, a very simple statistical procedure was developed for estimating German
war production. Every piece of German equipment (V‐2 rockets, tanks, or even automobile tire) was
stamped with a serial number that indicated the order which it was manufactured. If the total # of, say,
Mark I tanks produced by certain date was N, each would bear one of the integers 1 to N.
As the war progressed, some of these numbers became known to the Allies – either by the direct
capture of a piece of equipment or from records seized when a command post was overrun.
The problem was to estimate N using the sample of “captured” serial numbers
1 𝑋 𝑋 ⋯ 𝑋 𝑁
Q: How to estimate N using 𝑥 , … , 𝑥 ?
Sol: The 1st method assumes equal probability:
1
𝑃 𝑋 𝑥 ,𝑋 𝑥 ,…,𝑋 𝑥
𝑁
𝑛
N is estimated by adding the average gap to the maximum
order statistic:
1
𝑁 𝑥 : 𝑥 𝑥 1
𝑛 1
E.g., 2,6,8
1
𝑁 8 6 2 1 8 6 1 8 2 10
2
The 2nd method uses the discrete version of MLE:
𝑛 1
𝑁 𝑥 : 1
𝑛
It can be shown that: 1 1 #
Uniformly Minimum Variance Unbiased Estimator: Let 𝑥 , … , 𝑥 be a RS of
size n from 𝑓 𝑥; 𝜃 . An estimator 𝑇 ∗ of 𝜏 𝜃 is called a uniformly minimum
variance unbiased estimator (UMVUE) of 𝜏 𝜃 if:
1 𝑇 ∗ 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃
𝑎𝑛𝑑
2 𝐹𝑜𝑟 𝑎𝑛𝑦 𝑜𝑡ℎ𝑒𝑟 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑇 𝑜𝑓 𝜏 𝜃 , 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇 , ∀𝜃 ∈ Ω
Cramer‐Rao Lower Bound (CRLB):
𝜏 𝜃
𝑉𝑎𝑟 𝑇
𝜕
𝐸 ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
𝜕𝜃
𝜏 𝜃
or for RS 𝑉𝑎𝑟 𝑇
𝜕
𝑛𝐸 ln 𝑓 𝑥, 𝜃
𝜕𝜃
Note: ln f x , … , x ; θ is the Fisher Information
Proof of CRLB theorem: Define 𝑈 𝑥 , … , 𝑥 ; 𝜃 ≡ 𝑈 ≡ ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
1 𝜕
⇒𝑈 𝑓 𝑥 ,…,𝑥 ;𝜃
𝑓 𝑥 , … , 𝑥 ; 𝜃 𝜕𝜃
𝐸𝑢 0 because:
𝐸𝑢 ⋯ 𝑈 𝑥 , … 𝑥 ; 𝜃 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕
⋯ 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
𝜕
⋯ 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
𝜕
1
𝜕𝜃
0
𝜏 𝜃 𝐸𝑇 ⋯ 𝑡 𝑥 , … , 𝑥 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕
𝜏 𝜃 ⋯ 𝑡 𝑥 , … , 𝑥 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
𝜕
⋯ 𝑡 𝑥 ,…,𝑥 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
⋯ 𝑡 𝑥 , … , 𝑥 𝑈 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝐸 𝑇𝑈
∴ 𝑐𝑜𝑟 𝑇, 𝑈 𝐸 𝑇𝑈 𝐸𝑇𝐸𝑈 𝐸 𝑇𝑈
𝑐𝑜𝑟 𝑇, 𝑈 𝑐𝑜𝑟 𝑇, 𝑈
∴ 1 𝑝 1⇒ 1 1⇒ 1
𝑉𝑎𝑟 𝑇 𝑉𝑎𝑟 𝑈 𝑉𝑎𝑟 𝑇 𝑉𝑎𝑟 𝑈
𝑐𝑜𝑟 𝑇, 𝑈
⇒ 𝑉𝑎𝑟 𝑇
𝑉𝑎𝑟 𝑈
𝜏 𝜃
𝐸𝑈 𝐸𝑈
𝜏 𝜃
𝜕
𝐸 ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
𝜕𝜃
𝐼𝑓 𝑥 , … , 𝑥 𝑖𝑠 𝑎 𝑟𝑠, 𝑡ℎ𝑒𝑛: 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑓 𝑥 ;𝜃
𝜕 𝜕
∴ 𝑈 𝑥 ,…,𝑥 ;𝜃 ln 𝑓 𝑥 ;𝜃 ln 𝑓 𝑥 ; 𝜃
𝜕𝜃 𝜕𝜃
𝜕
𝐴𝑙𝑠𝑜, 𝐸𝑈 ⏟ 𝑉𝑎𝑟 𝑈 𝑉𝑎𝑟 ln 𝑓 𝑥 ; 𝜃
𝜕𝜃
𝜕
𝑉𝑎𝑟 ln 𝑓 𝑥 ; 𝜃
𝜕𝜃
𝜕
𝑛𝐸 ln 𝑓 𝑥; 𝜃
𝜕𝜃
𝜏 𝜃
∴ 𝑉𝑎𝑟 𝑇 #
𝜕
𝑛𝐸 ln 𝑓 𝑥; 𝜃
𝜕𝜃
Sol:
1) 𝜏 𝜇 𝜇⇒𝜏 𝜇 1
2) 𝑓 𝑥; 𝜇, 𝜎 𝑒
√
1
ln 𝑓 𝑥; 𝜇, 𝜎 ln 𝑥 𝜇
√2𝜋𝜎
𝜕 𝑥 𝜇 𝑥 𝜇
ln 𝑓 𝑥; 𝜇, 𝜎 0 𝜇
𝜕𝜇 𝜎 𝜎
𝑥 𝜇 1 𝑥 𝜇 1 1 1
𝐸 𝐸 ⏟ 𝐸𝑍 𝑉𝑎𝑟 𝑍 𝐸𝑍
𝜎 𝜎 𝜎 𝜎 𝜎 𝜎
~ ,
3) ∴ 𝐶𝑅𝐿𝐵 𝜇 : 𝑉𝑎𝑟 𝑇
; , ∙
𝜎
∴ 𝑉𝑎𝑟 𝑥̅ ∴ 𝑥̅ 𝑖𝑠 𝑡ℎ𝑒 𝑈𝑀𝑉𝑈𝐸 #
𝑛
𝟐 𝝉 𝝁 𝝁𝟐 𝟐
𝝉 𝝁 𝝁𝟐
𝟐
**𝑪𝑹𝑳𝑩 𝝁 𝝏 𝟐 ⏞ 𝝏 𝟐
𝒏𝑬 𝐥𝐧 𝒇 𝒙;𝝁,𝝈𝟐 𝒏𝑬 𝐥𝐧 𝒇 𝒙;𝝁,𝝈𝟐
𝝏𝝁 𝝏𝝁
𝟐
𝟐𝝁 ∙ 𝝁
𝟐
𝝏
𝒏𝑬 𝐥𝐧 𝒇 𝒙; 𝝁, 𝝈𝟐
𝝏𝝁
𝟐𝝁 𝟐 𝑪𝑹𝑳𝑩 𝝁
Note:
∑
E.g., Ex 2.18: 𝑇 𝑥̅ 𝑎𝑈 𝑏 (U is linear function of xi)
b) Suppose:
⇒𝑔 𝜏 𝜃 𝐸𝑇 ⏟ 𝑎 𝐸𝑇 𝑏 𝑎 𝜏 𝜃 𝑏
∴ 𝑜𝑛𝑙𝑦 𝑎 𝑝 𝑏 𝑎𝑑𝑚𝑖𝑡𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑤𝑖𝑡ℎ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑞𝑢𝑎𝑙 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑
2𝑝 ∗ 𝑝 1 𝑝
𝜏 𝑝 𝑝 ⇒ 𝐶𝑅𝐿𝐵 𝑝 𝑝 𝐶𝑅𝐿𝐵 𝑝
𝑛
2𝑥
Ex 2.19 Let RS 𝑥 𝑠~𝑓 𝑥; 𝜃 ,0 𝑥 𝜃
𝜃
a) Find an unbiased estimator T for 𝜃
b) What is Var(T)=? Compare Var(T) with CRLB(𝜃 .
Sol:
Thm 2.5: If an unbiased estimator for 𝜏 𝜃 exists, the variance of which achieves the CRLB, then only a
linear function of 𝜏 𝜃 will admit an unbiased estimator, the variance of which achieves the
corresponding CRLB.
Def 2.9: The relative efficiency of an unbiased estimator T of 𝜏 𝜃 to another unbiased estimator T* of
𝜏 𝜃 is:
𝑉𝑎𝑟 𝑇 ∗
𝑟𝑒 𝑇, 𝑇 ∗
𝑉𝑎𝑟 𝑇
Def 2.6:Mean Square Error and Bias: If T is an estimator of 𝜏 𝜃 , then the bias is 𝑏 𝑇 𝐸𝑇 𝜏 𝜃
and the Mean Squared Error (MSE) is:
𝑀𝑆𝐸 𝑇 𝐸 𝑇 𝜏 𝜃 𝑉𝑎𝑟 𝑇 𝑏 𝑇
Thm 2.6: 𝑀𝑆𝐸 𝑇 𝑉𝑎𝑟 𝑇 𝑏 𝑇
Proof: Homework 3 Q.1
Ex 2.20:
RS 𝑥 𝑠~𝐸𝑋𝑃 1, 𝜂 where ηis location parameter. We know that
𝑀𝑀𝐸: 𝜂̂ 𝑥̅ 1
Compare the MSEs.
𝑀𝐿𝐸: 𝜂̂ 𝑥:
𝐸 𝜂̂ 𝐸 𝑥̅ 1 𝐸 𝑥̅ 1 1 𝜂 1 𝜂 ⇒ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
1
𝐸 𝜂̂ 𝐸𝑥 : 𝐸𝑥 : 𝜂 𝜂 𝐸𝑥 : 𝜂 𝜂 ⏟ 𝜂 ⇒ 𝑏𝑖𝑎𝑠𝑒𝑑
𝑛
𝑉𝑎𝑟 𝑥
𝑛
𝑉𝑎𝑟 𝑌
𝑤ℎ𝑒𝑟𝑒 𝑌~𝐸𝑋𝑃 1
𝑛
1
𝑛
𝑉𝑎𝑟 𝑥 : 𝜂
𝑉𝑎𝑟 𝑍 𝑤ℎ𝑒𝑟𝑒 𝑍 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝑁𝑜𝑡𝑒
1
𝑛
1 1
𝑀𝑆𝐸 𝜂̂ 𝑉𝑎𝑟 𝜂̂ 𝐸 𝜂̂ 𝜂 0
𝑛 𝑛
∴
1 1 1 1 2
𝑀𝑆𝐸 𝜂̂ 𝑉𝑎𝑟 𝜂̂ 𝐸 𝜂̂ 𝜂 𝜂 𝜂
𝑛 𝑛 𝑛 𝑛 𝑛
Bayes and Minimax Estimators
Def 2.11: Loss Function: If T is an estimator of 𝜏 𝜃 , then the loss function is any real‐value function,
𝐿 𝑡; 𝜃 , such that:
𝐿 𝑡; 𝜃 0 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝜃
𝑎𝑛𝑑
𝐿 𝑡; 𝜃 0 𝑤ℎ𝑒𝑛 𝑡 𝜏 𝜃
Def 2.12:Risk Function: The risk function is defined to be the expected loss:
𝑅 𝜃 𝐸 𝐿 𝑇; 𝜃
𝑅 𝜃 𝑅 𝜃 ∀𝜃 ∈ Ω
𝑎𝑛𝑑
𝑅 𝜃 𝑅 𝜃 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝜃
An estimator is admissible iff there is no better estimator.
** Not called “best” estimator because it is the “best” only for a particular loss function 𝐿 𝑡; 𝜃 .
Def 2.14: Minimax Estimator:
An estimator 𝑇 is a minimax estimator if:
max 𝑅 𝜃 max 𝑅 𝜃
𝑇 𝜃 𝜃 𝜃
𝑇 𝑇 𝜃 𝜃 𝜃
⋮
Def 2.15: Bayes Risk:
𝐴 𝐸 𝑅 𝜃 𝑅 𝜃 𝑝 𝜃 𝑑𝜃
The Bayes estimator is the one which gives the smallest risk(Bayes)
↑
𝐸 𝐿 𝑇, 𝜃
Posterior Distribution: the conditional density of 𝜃 given the sample observation 𝑋 𝑥 , … , 𝑥 is
called the posterior density (or posterior pdf) and is given by:
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃
𝑓 | 𝜃
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃 𝑑𝜃
Posterior distribution integrates prior known information 𝜃 with the updated sample
information X
∑
̅
𝑐 ∙𝑒
∑ ̅
𝑐 𝑒 ∙𝑒
̅ ̅ ̅
𝑐 𝑒
̅
𝑁𝑜𝑡𝑒: 𝑐 𝑐 ∙𝑒
̅
𝑐 𝑒
̅
𝑐 𝑒
𝑛𝑥̅ 1
𝑁𝑜𝑡𝑒: 𝜇 𝑎𝑛𝑑 𝜎
𝑛 1 𝑛 1
2
2 𝜎 2 𝜎
𝑐 𝑒
𝑐 𝑒
∴ 𝑓| 𝜇 ⏟ 𝑐 𝑒
𝑐 ∴
1
𝑓| 𝜇 ⏟ 𝑒
|
2𝜋𝜎
𝑓 | 𝜇 𝑑𝜇 1
𝑐
∙𝑒 𝑑𝜇 1
𝑐
𝑐 1 1 𝑐 1
⇒ 𝑒 𝑑𝜇 ⇒
𝑐 2𝜋𝜎 2𝜋𝜎 𝑐 2𝜋𝜎
**** No Need to compute two expectations to find Bayes estimator:
𝐸 𝑅 𝜃 𝑅 𝜃 𝑝 𝜃 𝑑𝜃
𝐸 | 𝐿 𝑇; 𝜃 𝑝 𝜃 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝑥|𝜃 𝑝 𝜃 𝑑𝑥 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝑥; 𝜃 𝑑𝑥 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝜃|𝑥 𝑓 𝑥 𝑑𝑥 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝜃|𝑥 𝑑𝜃 𝑓 𝑥 𝑑𝑥
𝐸 | 𝐿 𝑇; 𝜃 𝑓 𝑥 𝑑𝑥
Thm 2.7: If𝑥 , … , 𝑥 denotes a RS from 𝑓 𝑥|𝜃 , then the Bayes estimator is the estimator that minimizes
the expected loss relative to the posterior distribution:
𝐸 | 𝐿 𝑇; 𝜃 .
Thm 2.8: The Bayes Estimator, T, of 𝜏 𝜃 under the squared error loss function, 𝐿 𝑇; 𝜃 𝑇 𝜏 𝜃 ,
is the conditional mean of 𝜏 𝜃 relative to the posterior:
𝑇 𝐸 | 𝜏 𝜃 𝜏 𝜃 𝑓 | 𝜃 𝑑𝜃
𝐼𝑓 𝜏 𝜇 𝜇⇒𝑇 𝐸 | 𝜇 𝜇𝑓 | 𝜇 𝑑𝜇 𝜇
E.g.:
𝜏 𝜇 𝜇 ⇒𝑇 𝑉𝑎𝑟 𝜇 𝐸𝜇 𝜎 𝜇
Ex 11.2.5: Let 𝑋 , … , 𝑋 be a sample from geometric distribution with parameter 𝑝, 0 𝑝 1. Assume
that the prior distribution of 𝑝 is BETA with 𝛼 4 and 𝛽 4.
a) Find the posterior distribution of 𝑝
b) Find the Bayes estimate under quadratic loss function(squared loss)
Sol:
∑
𝐿 𝑋 , … , 𝑋 |𝜃 𝑝 1 𝑝 𝑝 1 𝑝
The product of the likelihood function and the prior is given by
𝑝 1 𝑝 ∑ 140𝑝 1 𝑝 140𝑝 1 𝑝 ∑
Because, posterior is proportional to prior of p times likelihood, factoring out the constant
140(posterior is proportional, not necessarily equal) gives a beta distribution with 𝛼 1 𝑛
3 𝑎𝑛𝑑 𝛽 1 3 ∑ 𝑥 𝑛. Therefore the posterior is proportional to 𝐵𝐸𝑇𝐴 𝑛
4, ∑ 𝑥 𝑛 4
b) Recall that for 𝐵𝐸𝑇𝐴 𝛼, 𝛽 random variable, the mean is . Because the Bayes estimate is the
posterior mean, the mean of 𝐵𝐸𝑇𝐴 𝑛 4, ∑ 𝑥 𝑛 4 is
𝑛 4 𝑛 4
∑ 𝑥 𝑛 4 𝑛 4 ∑ 𝑥 8
Sol: Compute the posterior:
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃 𝑒 𝜃
𝑓 | 𝜃 𝑁𝑜𝑡𝑒: 𝑓 𝑥
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃 𝑑𝜃 𝑥!
𝑒 𝜃 1
∏ ∙ ∙𝜃 𝑒
𝑥! 𝛽 Γ 𝛼
𝑒 𝜃 1
∏ ∙ ∙𝜃 𝑒 𝑑𝜃
𝑥! 𝛽 Γ 𝛼
𝑒 𝜃∑ 𝜃 𝑒
𝑒 𝜃∑ 𝜃 𝑒 𝑑𝜃
𝑒 𝜃∑
𝑒 𝜃∑ 𝑑𝜃
1
𝑁𝑜𝑡𝑒: 𝛼 𝑥 𝛼, 𝛽
1
𝑛
𝛽
1
𝛽 Γ 𝛼 𝑒 𝜃
∙ 𝑁𝑜𝑡𝑒: 𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 1
1
𝛽 Γ 𝛼 𝑒 𝜃 𝑑𝜃
1
𝑒 𝜃
𝛽 Γ 𝛼
1
∴𝑓 | 𝜃 𝑖𝑠 𝐺𝐴𝑀 𝛼 , 𝛽 𝐺𝐴𝑀 𝑥 𝛼,
1
𝑛
𝛽
∑𝑥 𝛼
∴ 𝐵𝑦 𝑇ℎ𝑚 2.8, 𝜃 𝐸 | 𝜃
1
𝑛
𝛽
1
𝑛 ∑𝑥 𝛽
∙ 𝛼𝛽
1 𝑛 1
𝑛 𝑛
𝛽 𝛽
1
𝑛 𝛽
𝑥̅ 𝐸 𝜃
1 1
𝑛 𝑛
𝛽 𝛽
𝑛 ∑𝑥 𝛼
𝑉𝑎𝑟 𝑥̅ 𝐸 𝜃 𝑁𝑜𝑡𝑒: 𝐸 𝑥 𝜃
1 1
𝑛 𝑛
𝛽 𝛽
𝑛 𝑉𝑎𝑟 𝑥 𝑛𝜃 𝛼
𝜃 𝑁𝑜𝑡𝑒: 𝑉𝑎𝑟 𝑥 𝜃
1 𝑛 1
𝑛 𝑛
𝛽 𝛽
𝑛 𝜃 𝑛𝜃 𝛼
𝜃
1 𝑛 1
𝑛 𝑛
𝛽 𝛽
𝜃 →
𝑛𝜃 𝛼 𝜃
𝛽 →
⎯⎯ 𝑉𝑎𝑟 𝑥̅ #
1 𝑛
𝑛
𝛽
Ex 2.23: Predicting the annual number of hurricanes that will hit the U.S. mainland is a problem receiving
a great deal of public attention. (e.g., four major hurricanes struck Florida in Summer 2004; and
Hurricane Katrina attacked New Orleans in August 2005) Assuming the number of hurricanes reaching
the mainland is Poisson distributed with a yearly expected number of 𝜃, and the prior distribution of 𝜃 is
gamma, i.e., 𝑝 𝜃 𝜃 𝑒 , 0 𝜃. What is the Bayes estimator of 𝜃? Assume squared loss
function.
Sol: Prior is 𝐺𝐴𝑀 𝛼, 𝛽 ⇐ 𝑓 𝑥 ~𝑃𝑂𝐼 𝜃
Because the oldest data is most unreliable, we use it to
estimate 𝛼, 𝛽:
88
∴𝐸 𝜃 𝛼𝛽
50
1
∴ 𝑊𝑒 𝑔𝑢𝑒𝑠𝑠: 𝛼 88, 𝛽
50
1
⇒𝑝 𝜃 𝜃 𝑒
50 Γ 88
1 ℎ𝑢𝑟𝑟𝑖𝑐𝑎𝑛𝑒𝑠
∴ 𝜃 252 ∗ 1.7 #
150 𝑦𝑒𝑎𝑟
Thm 2.9: The Bayes estimator, 𝜃 , of 𝜃 under absolute error loss
𝐿 𝜃; 𝜃 𝜃 𝜃
Is the median of the posterior 𝑓 | 𝜃 .
***SUMMARY:
𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝐿 𝑇; 𝜃
↓
𝑅𝑖𝑠𝑘 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑅 𝜃 𝐸 𝐿 𝑇; 𝜃
↙ ↓ ↘
𝐸 𝑅 𝜃 𝐸 𝑅 𝜃
𝑅 𝜃 𝑅 𝜃 ∀𝜃 ∈ Ω 𝐵𝑎𝑦𝑒𝑠 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟
max 𝑅 𝜃 max 𝑅 𝜃
𝐴𝑑𝑚𝑖𝑠𝑠𝑖𝑏𝑙𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 ↙ ↘
𝐵𝑒𝑠𝑡 𝑓𝑜𝑟 𝑡ℎ𝑖𝑠 𝐿 𝑇; 𝜃 𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐿𝑜𝑠𝑠 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑟𝑟𝑜𝑟 𝐿𝑜𝑠𝑠
𝜏̂ 𝜃 𝐸 | 𝜏 𝜃 𝜃 𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑓 | 𝜃
𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟
E.g. of Bayes: 𝜃 𝐸 | 𝜃 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝜃
𝐸 | 𝜃 𝜃 𝑓 | 𝜃 𝑑𝜃
Chapter 3: Sufficiency and Completeness
Sufficiency
0 𝑖𝑓 ℎ𝑒𝑎𝑑𝑠
Ex 3.1: A coin is tossed n times: 𝑥 𝑅𝑆 𝑥 𝑠~𝐵𝐼𝑁 1, 𝜃 . What info do we need if we
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
want to estimate 𝜃?
Sol: 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝜃∑ 1 𝜃 ∑
, 𝑥 0, 1
𝐷𝑒𝑓𝑖𝑛𝑒: 𝑆 𝑥 ~𝐵𝐼𝑁 𝑛, 𝜃
𝑛
𝑓 𝑠; 𝜃 𝜃 1 𝜃
𝑠
𝑃𝑋 𝑥 ,𝑋 𝑥 ,…,𝑋 𝑥 ;𝑆 𝑠
𝑓 ,…, | 𝑥 ,…,𝑥
𝑓 𝑠; 𝜃
𝑓 𝑥 ,..,𝑥
𝑛
𝜃 1 𝜃
𝑠
𝜃∑ 1 𝜃 ∑
𝑛
𝜃 1 𝜃
𝑠
1
𝑛 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝜃.
𝑠
Ex. 3.2: RS 𝑥 𝑠~𝐸𝑋𝑃 𝜃 . Find the sufficient statistic.
,
,…,
1 ∑ 1 ∑
𝑒 𝑒 ∙ ⏞
1
𝜃 𝜃
1
𝑇ℎ𝑖𝑠 𝑠𝑢𝑔𝑔𝑒𝑠𝑡𝑠 𝑆 𝑥 ~Γ 𝑛, 𝜃 ⇒ 𝑓 𝑠; 𝜃 𝑠 𝑒 , 𝑠 0
𝜃 Γ 𝑛
𝑓 ,…, 𝑥 ,…,𝑥 ;𝜃
⇒ 𝑓 | 𝑥|𝑠
𝑓 𝑠; 𝜃
∑
1
𝑒
𝜃
1
𝑠 𝑒
𝜃 Γ 𝑛
Γ 𝑛
𝑔 𝜃
𝑠
∴𝑆 𝑥 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡. #
𝑓 𝑥 ,…,𝑥 ;𝜃 𝑔 𝑆, 𝜃 ∙ ℎ 𝑥 , … , 𝑥
Sol:
𝑓 𝑥 ,…,𝑥 ;𝜃 𝜃∑ 1 𝜃 ∑
𝜃 1 𝜃 ∙ 1
𝑔 𝑠; 𝜃 ∙ ℎ 𝑥 , … , 𝑥
**Note: Any one‐to‐one function of S is also sufficient
Sol:
1
𝑓 𝑥 ,…,𝑥 ;𝜃 , 0 𝑥 𝜃, 𝑖 1, … , 𝑛
𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Method 1: By definition
𝑓 𝑥 ,…,𝑥 ;𝜃
Consider 𝑆 𝑥 : ⇒ 𝑔 𝜃
𝑓 𝑥
𝑥
𝐹 𝑥 𝐹 𝑥
:
𝜃
𝑥
𝑓 𝑥 𝑛 ,0 𝑥 𝜃
:
𝜃
Compute the conditional distribution:
1
𝑓 𝑥 ,…,𝑥 ;𝜃 𝜃 1
𝑔 𝜃
𝑓 : 𝑥 𝑥 𝑛𝑥
𝑛
𝜃
∴𝑆 𝑥 : is sufficient for 𝜃
Method 2: By factorization:
1
𝑓 𝑥 ,…,𝑥 ;𝜃 , 0 𝑥 𝜃 ⇔ 0 𝑥 : &
𝜃 𝑥 : 𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1
, 0 𝑥: , 𝑥 : 𝜃 𝐼 1, 𝑖𝑓 𝑎 𝑥 𝑏
𝜃 , 𝑥
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1
𝐼 , 𝑥 : 𝐼 , 𝑥 :
𝜃
1
𝑔 𝑥, 𝜃 ℎ 𝑥 , … , 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥, 𝜃 𝐼 , 𝑥 : 𝑎𝑛𝑑 ℎ 𝑥 , … , 𝑥 𝐼 , 𝑥 :
𝜃
⇒ ∴ 𝑆 𝑥 : is sufficient for 𝜃
1
𝑓 𝑥 , 0 𝑥 𝜃, 𝜃 0
𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Sol: The likelihood function of the sample is:
1
𝑓 𝑥 ,…,𝑥 , 𝑖𝑓 0 𝑥 ,…,𝑥 𝜃,
𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
We can now write 𝑓 𝑥 , … , 𝑥 as
where
1, 𝑖𝑓 𝑥 , … , 𝑥 0
ℎ 𝑥 ,…,𝑥
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and
1
𝑔 𝜃; 𝑥 , 𝑖𝑓 0 𝑥 𝜃,
𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
| |
Ex 3.5: RS 𝑥 𝑠~𝑓 𝑥; 𝜎 𝑒 , ∞ 𝑥 ∞, 𝜎 0
Sol: Use factorization:
1 | |
𝑓 𝑥 ,…,𝑥 ;𝜎 𝑒
2𝜎
1 1 ∑| |
𝑒 𝑤ℎ𝑒𝑟𝑒 𝑆 |𝑥 |
2 𝜎
1 1
ℎ 𝑥 ,…,𝑥 ∙ 𝑔 𝑆, 𝜎 𝑤ℎ𝑒𝑟𝑒 ℎ 𝑥 , … , 𝑥 𝑎𝑛𝑑 𝑔 𝑆, 𝜎 𝑒
2 𝜎
∴ 𝑆 |𝑥 | is sufficient for σ. #
𝛼
𝑓 𝑥; 𝛼, 𝛽 𝑥 𝑒 , 𝑥 0
𝛽
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
2
𝑓 𝑥; 2, 𝛽 𝑥𝑒 , 𝑥 0
𝛽
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Sol:
a) 𝑓 𝑥 , … , 𝑥 ; 2, 𝛽 ∏ 𝑥𝑒 𝐼 , 𝑥 where I is Indicator function (piecewise)
∑
2 𝑥𝐼 , 𝑥 ∙𝛽 𝑒
,
,…,
ℎ 𝑥 ,..,𝑥 ∙ 𝑔 𝑆, 𝛽 𝑤ℎ𝑒𝑟𝑒 𝑆 𝑥
∴ 𝑆 𝑥 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝛽
b) (See later discussion)
Ex 3.7: RS xi’s from a one‐parameter Weibull distribution
𝛼
𝑓 𝑥; 𝛼, 2 𝑥 𝑒 , 𝑥 0
2
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Sol:𝑓 𝑥 , … , 𝑥 ; 𝛼, 2 ∏ 𝑥 𝑒 𝐼 , 𝑥
∑
𝑥 𝐼 , 𝑥 ∙𝛼 2 𝑥 𝑒
∑
ℎ 𝑥 ,…,𝑥 ∙ 𝛼 2 𝑥 𝑒
ℎ 𝑥 ,…,𝑥 ∙𝑔 𝑥 , 𝑥 ;𝛼
ℎ 𝑥 ,…,𝑥 ∙ 𝑔 𝑥 ,…,𝑥 ;𝛼
Note: Exponential family of probability distributions (ex: Poisson, normal, gamma, and Bernoulli) have
density functions of form:
exp 𝑘 𝑥 𝑐 𝜃 𝑆 𝑥 𝑑 𝜃 , 𝑖𝑓 𝑥 ∈ Β
𝑓 𝑥; 𝜃
0, 𝑥∉Β
Where B does not depend on the parameter 𝜃
Thm: Let 𝑋 , … 𝑋 be a random sample from a population with pdf of pmf of the exponential form
exp 𝑘 𝑥 𝑐 𝜃 𝑆 𝑥 𝑑 𝜃 , 𝑖𝑓 𝑥 ∈ Β
𝑓 𝑥; 𝜃
0, 𝑥∉Β
Proof: The joint density
𝑓 𝑥 ,…,𝑥 ;𝜃 exp 𝑐 𝜃 𝑘 𝑥 𝑆 𝑥 𝑛𝑑 𝜃
exp 𝑐 𝜃 𝑘 𝑥 𝑛𝑑 𝜃 exp 𝑆 𝑥
Using the factorization theorem, the statistic ∑ 𝑘 𝑋 is sufficient.
***dim(S) may or may not be equal to dim(𝜽)
Def 3.2: Minimal Sufficient Statistic: S is a minimal sufficient statistic for 𝜃 if S is sufficient and if
dim 𝑆 dim 𝑇 for every other sufficient statistics, T.
**Methods for verifying whether a set of statistics is minimally sufficient are given in Wasan’s book in
1970 “Parametric Estimation”‐ McGraw‐Hill
Thm 3.2: Let S be a set of jointly sufficient statistic for 𝜃:
1) If 𝜃 is a unique MLE, then 𝜃 is a function of S
2) If 𝜃 is a unique MLE and jointly sufficient for 𝜃, then 𝜃 is minimally sufficient and a
function of S *********
**Two main usages of sufficient statistics:
1) Compress info for estimating parameters
2) Improve the accuracy of estimators
Thm 3.3: Rao‐Blackwell
1) T* is a function of S and doesn’t depend on 𝜃
2) T* is an unbiased estimator of 𝜏 𝜃
3) 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝜃 𝑎𝑛𝑑 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝜃
∗
𝑢𝑛𝑙𝑒𝑠𝑠 𝑇 𝑇
Note:
1) We can restrict to sufficient statistics
2) If we find unbiased T, then we can improve by E[T|S]
3) If there is only one 𝑇 ∗ ℎ 𝑆 𝐸 𝑇|𝑆 that is unbiased, then it must be the UMVUE
Def 3.3: Complete: A sufficient statistic is called complete if it is a unique unbiased estimator (after
adjustment)
Thm 3.4: Lehmann‐Scheffe (L‐S)
STEPS:
1) Find S that is complete sufficient for 𝜃.
2) Find unbiased estimator T for 𝜏 𝜃
a. Yes (2.1) if T is a function of S, then DONE ⇒ T is UMVUE
b. No (2.2) otherwise, compute 𝑇 ∗ 𝐸 𝑇|𝑆 ⇒ 𝑇 ∗ is UMVUE
Def 3.4: K‐Parameter Regular Exponential Class: (REC)
A density function is said to be a member of the regular K‐parameter exponential class:
If it can be expressed in the form
𝑓 𝑥; 𝜃 𝑐 𝜃 ℎ 𝑥 exp ∑ 𝑞 𝜃 𝑡 𝑥 𝑥∈𝐴 𝑥: 𝑓 𝑥; 𝜃 0
and zero otherwise, where 𝜃 𝜃 , … , 𝜃 is a vector of k unknown parameters
If the parameter space is the interval set
Ω 𝜃|𝑎 𝜃 𝑏, 𝑖 1, … , 𝑘
where a , b s are known constants and can be ∞, and if it satisfies regularity conditions 1,2,
and 3a or 3b given by:
1) The set 𝐴 𝑥: 𝑓 𝑥; 𝜃 0 does not depend on 𝜃 (Uniform is NOT REC)
2) 𝑞 𝜃 are nontrivial, functionally independent, continuous functions of 𝜃. 𝑞 𝜃 𝑔 𝑞 𝜃
3) a) For continuous rv, 𝑡 𝑥 ≢ 0 are linearly independent continuous functions of x over A
b) For discrete rv, 𝑡 𝑥 are nontrivial, linearly independent functions of x on A
Notes:
1) Notation: 𝑓 𝑥; 𝜃 ∈ 𝑅𝐸𝐶 𝑜𝑟 𝑅𝐸𝐶 𝑞 , … , 𝑞
2) The pdf of REC can also be expressed as:
𝑓 𝑥; 𝜃 exp 𝑞 𝜃 𝑡 𝑥 𝑙𝑛 𝑐 𝜃 𝑙𝑛 ℎ 𝑥
𝑒 , 𝑥 0
Sol:𝑓 𝑥; 𝜃 𝑒 𝐼 , 𝑥
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1 1
𝐼 , 𝑥 exp ∙ ⏟
𝑥
⏟
𝜃 𝜃
𝑓 𝑥; 𝜃 𝑐 𝜃 ℎ 𝑥 exp 𝑞 𝜃 𝑡 𝑥
Other conditions:
1) 0 𝜃 ∞
2) 0 𝑥 ∞
3) 𝑞 𝜃 , 𝑛𝑜𝑛 𝑡𝑟𝑖𝑣𝑖𝑎𝑙
4) 𝑡 𝑥 𝑥, 𝑛𝑜𝑛 𝑡𝑟𝑖𝑣𝑖𝑎𝑙
∴ 𝐸𝑋𝑃 𝜃 ∈ 𝑅𝐸𝐶 #
𝑒 ∙𝑒 𝑥
∙𝐼 ,
𝑝
1 𝑝 ∙𝐼 , 𝑥 exp 𝑙𝑛 ∙𝑥
1 𝑝
𝑝
𝑐 𝑝 1 𝑝, ℎ 𝑥 𝐼 , 𝑥 , 𝑞 𝑝 𝑙𝑛 , 𝑡 𝑥 𝑥
1 𝑝
1) 𝐴 𝑥: 𝑓 𝑥; 𝑝 0 0,1 ⊥ 𝑝
2) 𝑞 𝑝 𝑙𝑛 Non‐trivial
3) 𝑡 𝑥 𝑥 Non‐trivial
⇒ ∴ 𝐵𝐼𝑁 1, 𝑝 ∈ 𝑅𝐸𝐶 #
Ex 3.10: 𝑥~Γ 𝛼, 𝛽 , 𝑓 𝑥; 𝛼, 𝛽 𝑥 𝑒 𝐼 , 𝑥 . Show that Γ 𝛼, 𝛽 ∈ 𝑅𝐸𝐶.
Sol:𝑓 𝑥; 𝛼, 𝛽 𝐼 , 𝑥 exp 𝛼 1 ln 𝑥 ⏟
𝑥
,
, ,
1) 𝐴 𝑥: 𝑓 𝑥; 𝛼, 𝛽 0 0, ∞ ⊥ 𝛼, 𝛽
𝑞 𝛼, 𝛽 𝛼 1
2) 𝑁𝑜𝑛 𝑡𝑟𝑖𝑣𝑖𝑎𝑙, 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦 ⊥
𝑞 𝛼, 𝛽
𝑡 𝑥 ln 𝑥
3) 𝑙𝑖𝑛𝑒𝑎𝑟 ⊥
𝑡 𝑥 𝑥 1
∴ Γ 𝛼, 𝛽 ∈ 𝑅𝐸𝐶 #
Thm 3.5: If 𝑥 , … , 𝑥 is a random sample from a member of the regular exponential class
𝑅𝐸𝐶 𝑞 , … , 𝑞 , then the statistics
𝑆 ,…,𝑆 𝑡 𝑥 , 𝑡 𝑥 ,…, 𝑡 𝑥
Are a minimal set of complete sufficient statistics for 𝜃 , … , 𝜃 .
Ex 3.11: In HW#3 Q.4, 𝑥 ~𝐸𝑋𝑃 𝜃 , 𝑇 ̅
is unbiased for 1/𝜃. Find the UMVUE for .
Sol: We know that 𝑥̅ achieve 𝐶𝑅𝐿𝐵 𝜃 ⇒ 𝑥̅ is UMVUE for 𝜃
1
∴ 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝜃
𝜃
1
∴ 𝑁𝑜 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑐𝑎𝑛 𝑎𝑐ℎ𝑖𝑒𝑣𝑒 𝐶𝑅𝐿𝐵 .
𝜃
But UMVUE may still exist.
From Ex 3.8, EXP θ ∈ REC
n 11
n 1
Also, T is function of S and unbiased
∑𝑥
n 𝑥̅
1
∴ T is the UMVUE for 𝐿 𝑆 Thm 3.4 #
θ
Ex 3.13: Find the UMVUE for 𝜇 and 𝜎 of 𝑁 𝜇, 𝜎
Sol:
STEP 1: Find the complete & sufficient statistics for 𝜇 and 𝜎
Express 𝑁 𝜇, 𝜎 in REC form:
1 𝑥 𝜇
𝑓 𝑥; 𝜇, 𝜎 exp
√2𝜋𝜎 2𝜎
1 𝑥 𝑥𝜇 𝜇
exp
√2𝜋𝜎 2𝜎 𝜎 2𝜎
1 1 𝜇
𝑒 𝐼 exp 𝑥 𝑥
√2𝜋𝜎 2𝜎 𝜎
1 1 𝜇
𝑐 𝜇, 𝜎 𝑒 , ℎ 𝑥 𝐼, 𝑞 𝜇, 𝜎 , 𝑡 𝑥 𝑥 , 𝑞 𝜇, 𝜎 ,
√2𝜋𝜎 2𝜎 𝜎
𝑡 𝑥 𝑥
1) 𝐴 𝑥: 𝑓 𝑥; 𝜇, 𝜎 0 ∞, ∞ ⊥ 𝜇, 𝜎
𝑞 𝜇, 𝜎
2) Non‐trivial, functionally ⊥
𝑞 𝜇, 𝜎
𝑡 𝑥 2𝑥
3) Not linear of each other
𝑡 𝑥 1
STEP 2: Find an unbiased estimator that is a function of S1 and S2
∑𝑥
∴ 𝑀𝐿𝐸𝑠 ∶ 𝜇̂ 𝑔 𝑆
𝑛
∑𝑥
𝑛 𝑛 ∑ 𝑥 𝑥̅ ∑ 𝑥
𝑛
𝑆 𝜎
𝑛 1 𝑛 1 𝑛 𝑛 1
∑𝑥 ∑𝑥
∑𝑥 2∑𝑥
𝑛 𝑛
𝑛 1
∑𝑥
∑𝑥
𝑛
𝑔 𝑆 ,𝑆
𝑛 1
∴ By Thm 3.4 L S Thm , x and S are UMVUE for μ, σ #
Ex 3.14: RS𝑥 𝑠~𝑓 𝑥; 𝜃 𝜃 𝑥𝑒 ,0 𝑥 ∞, 𝑤ℎ𝑒𝑟𝑒 𝜃 0. Find the UMVUE for 𝜃.
Sol: STEP 1: Note that this is Γ 2,
REC form is:
𝑓 𝑥; 𝜃 𝜃 𝑥𝑒
𝑐 𝜃 𝜃 , ℎ 𝑥 𝑥, 𝑔 𝜃 𝜃, 𝑡 𝑥 𝑥
1) 𝐴 𝑥: 𝑓 𝑥; 𝜃 0 0, ∞ ⊥ 𝜃
2) 𝑞 𝜃 𝜃 Non‐trivial
3) 𝑡 𝑥 1 0
∴𝑆 𝑥 is complete & sufficient
STEP 2: Find an unbiased estimator for 𝜃
2 2𝑛
𝑇𝑟𝑦: 𝐸 𝑆 𝐸 𝑥 𝐸𝑥
𝜃 𝜃
1 1
𝐸
𝑆 𝐸𝑆
𝐸𝑆 𝐸 𝑥
Γ 2𝑛 1 1
Γ 2𝑛 𝜃
Γ 2𝑛 1 1
Γ 2𝑛 𝜃
2𝑛 2 !
𝜃
2𝑛 1 !
1
𝜃
2𝑛 1
2𝑛 1
∴𝑇 2𝑛 1 𝑆 2𝑛 1 𝑥 𝑥̅ is unbiased for 𝜃 .
𝑛
T is also the UMVUE for 𝜃 because it is a function of the complete & sufficient statistic. #
***Summary:
𝑆 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
1) 𝜃 𝑓 𝑆
𝜃 𝑖𝑠 𝑢𝑛𝑖𝑞𝑢𝑒
𝑆 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑇∗ 𝐸 𝑇|𝑆 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃
2)
𝑇 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇
𝑆 𝑖𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑎𝑛𝑑 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
3) 𝑇∗ 𝑡 𝑆 𝑖𝑠 𝑈𝑀𝑉𝑈𝐸
𝑇 ∗ 𝑡 𝑆 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃
4) 𝑅𝐸𝐶 → 𝑆 ∑ 𝑡 𝑥 ,…,𝑆 ∑ 𝑡 𝑥 𝑖𝑠 𝑚𝑖𝑛𝑖𝑚𝑎𝑙 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒
& 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
Chapter 4: Interval Estimation
1 .
𝑓 𝑥; 𝜃 𝑒 , ∞ 𝑥 ∞
√2𝜋0.8
What value of 𝜃 is believable?
∑
Sol: 𝜃 𝑥̅ 9.5
0.8
∴ 𝑥 ~𝑁 𝜃, 0.8 , ∴ 𝑥̅ ~𝑁 𝜃,
4
𝑥̅𝜃
0.8 𝑥̅ 𝜃
⇒ ~𝑁 0,1
√4 0.4
𝑥̅ 𝜃
1 𝛼 𝑃 𝑍 0.8 𝑍
√4
0.8 0.8
𝑃 𝑍 𝑥̅ 𝜃 𝑍 𝑥̅
√4 √4
0.8 0.8
𝑃 𝑥̅ 1.96 𝜃 𝑥̅ 1.96
√4 √4
Def 4.1: An interval 𝑙 𝑥 , … , 𝑥 , 𝑢 𝑥 , … , 𝑥 is called a 𝟏𝟎𝟎𝜸% Confidence Interval for 𝜃 if:
𝑃 𝑙 𝑥 ,…,𝑥 𝜃 𝑢 𝑥 ,…,𝑥 𝛾
Where 0 𝛾 1. The observed value 𝑙 𝑥 , … , 𝑥 and 𝑢 𝑥 , … , 𝑥 are called lower and upper
confidence limits, respectively.
If 𝑃 𝑙 𝑥 , … , 𝑥 𝜃 𝛾, then 𝑙 𝑥 , … , 𝑥 is called a one‐side lower 𝟏𝟎𝟎𝜸% confidence limit
for 𝜃
If 𝑃 𝜃 𝑢 𝑥 ,…,𝑥 𝛾, then 𝑢 𝑥 , … , 𝑥 is called a one‐side upper 𝟏𝟎𝟎𝜸% confidence
limit for 𝜃
Sol:𝛾 𝑃𝐿 𝜃 ⇒ 𝑊ℎ𝑎𝑡 𝑖𝑠 𝐿 ?
∑𝑥 2𝑛𝑥̅
∴ 𝑥̅ 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝜃 𝑥̅ ⇒ ~𝑋 2𝑛
𝑛 𝜃
̅
𝛾 𝑃 𝑋 2𝑛 ←γ percentile of chi square with 2n dof
2𝑛𝑥̅
𝑃 𝜃
𝑋 2𝑛
2𝑛𝑥̅
∴𝑙 𝑥 is the one side lower 100γ% CL for θ
𝑋 2𝑛
2𝑛𝑥̅
Similarly, 𝑢 𝑥 is the one side upper 100γ% CL for θ
𝑋 2𝑛
̅ ̅
𝑙 𝑥 ,𝑢 𝑥 , is two side 100 1 α % CI for 𝜃
*Note: For two‐side CI, if 𝛼 𝛼 then it is the
“equal‐tailed” choice.
⇒ Best choice (smallest width) if pdf is a single hump
distribution.
𝜎
∴ 𝐸 𝑤𝑖𝑑𝑡ℎ 𝐸 𝑤
⏟ 𝐸𝑢 𝑙 𝐸 2𝑍
√𝑛
𝜎 𝜎
2𝑍 𝑤ℎ𝑒𝑟𝑒 𝑑 𝑍
√𝑛 √𝑛
𝜎 2𝜎 2𝜎
⇒ 2𝑍 2𝑑 ⇒ 𝑛 𝑍 𝑍
√𝑛 2𝑑 𝑤
1
𝑛∝ #
𝑤
Ex 4.4: For a certain new model of microwave oven, it is desired to set a guarantee period so that only
𝛾% of the ovens sold will have had a major failure in this length of time. Assume that the time to the 1st
major failure is 𝑁 𝜇, 𝜎 , the guarantee period should end at 𝑡 𝜇 𝑍 𝜎. Suppose the company will
charge $C if a customer purchases the insurance for the oven. The cost for fixing a failure oven is $F.
Given a r.s. of time to failure 𝑥 , … , 𝑥 , how do you determine the insurance policy (pricing)?
Sol: 𝑇 𝑥̅ 𝑍 𝜎 𝐸𝑆 𝜎 𝐸 √𝑆 𝜎
Unbiased estimator for 𝜎 in
𝑛 1
𝑛 1Γ 2 ∑ 𝑥 𝑥̅
𝜎 𝑛
2 Γ 𝑛 1
2
𝐶.
𝐶 . 𝛾 . 𝐹⇒𝛾 .
𝐹
For 0.5 yr: 0.5 1 𝑍 .
0.5 ⇒ 𝑍 .
1⇒𝛾 . 0.159 16%
𝐶 . 0.16 ∗ 100 16 $
𝐶 0.5 ∗ 100 50 $ #
Def 4.2: Pivotal Quantity
Def 4.3:
E.g.s:
Location: 𝑥~𝐸𝑋𝑃 1, 𝜂
𝑒 , 𝑥 𝜂 𝑒 , 𝑥 0
𝑓 𝑥; 𝜂 𝑓 𝑥
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Scale: 𝑥~𝐸𝑋𝑃 𝜃
1 1
𝑓 𝑥; 𝜃 𝑒 , 𝑥 0 𝑓 𝑥 𝑒
𝜃 𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0
Location & Scale: 𝑁 𝜇, 𝜎
1 1 1 1
𝑓 𝑥; 𝜇, 𝜎 𝑒 𝑓 𝑥 𝑒
√2𝜋𝜎 𝜎 𝜎 √2𝜋
1) 𝜃 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 ∶ 𝑄 𝜃 𝜃 is a PQ (Ex 4.1)
2) 𝜃 𝑠𝑐𝑎𝑙𝑒: 𝑄 is a PQ (Ex 4.2)
Sol: 𝜃 𝜇̂ 𝑥̅ , 𝜃 𝜎 𝑆 ⇒𝜎 𝑆
𝑥̅ 𝜇 1 𝑥̅ 𝜇
𝑄
𝑛 1 √𝑛 1 𝑆
𝑆 √𝑥
𝑛
𝑥̅ 𝜇
∴ 1 𝛼 𝑃 𝑡 𝑡
𝑆
√𝑥
𝑆 𝑆
𝑃 𝑥̅ 𝑡 𝜇 𝑥̅ 𝑡
√𝑛 √𝑛
𝑆 𝑆
∴ 𝑥̅ 𝑡 , 𝑥̅ 𝑡 𝑖𝑠 𝑎 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜇.
√𝑛 √𝑛
𝑛 1 𝑛 1
𝑆 𝑆 1 𝑛 1 𝑆
𝑛 𝑛
𝑄 ⇒ 𝑄
𝜎 𝜎 𝑛 𝜎
∴ 1 𝛼 𝑃 𝑋 𝑋 𝑛 1
𝑛 1 𝑆 𝑛 1 𝑆
𝑃 𝜎
𝑋 𝑛 1 𝑋 𝑛 1
𝑛 1 𝑆 𝑛 1 𝑆
∴ , 𝑖𝑠 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜎 . #
𝑋 𝑛 1 𝑋 𝑛 1
𝜎 𝑘𝑛𝑜𝑤𝑛 𝐸𝑥 4.1
⎧𝜇
𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝐸𝑥 4.5
For 𝑁 𝜇, 𝜎 ,
⎨𝜎 𝜇 𝑘𝑛𝑜𝑤𝑛 ⇒ 𝐸𝑥 4.6
⎩ 𝜇 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝐸𝑥 4.5
∑ ∑ ̅
Sol: STEP 1: ~𝑋 𝑛 1
̅ ∑ 𝑥 𝜇 𝑥 𝜇
~𝑋 𝑛
𝜎 𝜎
STEP 2:
𝑥 𝜇
1 𝛼 𝑃 𝑋 𝑛 𝑋 𝑛
𝜎
∑ 𝑥 𝜇 ∑ 𝑥 𝜇
𝑃 𝜎
𝑋 𝑛 𝑋 𝑛
∑ 𝑥 𝜇 ∑ 𝑥 𝜇
∴ , 𝑖𝑠 𝑎 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜎 𝑤ℎ𝑒𝑛 𝜇 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛 #
𝑋 𝑛 𝑋 𝑛
Compare two CIs for 𝜇 known/unknown:
∑𝒏
𝒊 𝟏 𝒙𝒊 𝝁
𝟐 ∑𝒏 𝒙 𝝁 𝟐
𝟐 , 𝒊 𝟏𝟐 𝒊
𝑿 𝜶 𝒏 𝑿𝜶 𝒏
𝟏
𝟐 𝟐
∑ ∑
∑𝒏
𝒊 𝟏 𝒙𝒊 𝒙
𝟐 ∑𝒏 𝒙 𝒙 𝟐
, , , 𝒊 𝟐𝟏 𝒊
𝑿𝟐 𝜶 𝒏 𝟏 𝑿𝜶 𝒏 𝟏
𝟏
𝟐 𝟐
Two Sample Problem:
Note: ⋚ 𝑝𝑜𝑠𝑒𝑠 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛 𝑖𝑓 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛, 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜, 𝑜𝑟 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝑥 𝑠~𝑁 𝜇 , 𝜎 𝜎
𝜇 𝜇 ⋚0 ⋚1
𝑌 𝑠~𝑁 𝜇 , 𝜎 𝜎
**Mean: 𝝁𝟏 𝝁𝟐 𝑥 , … , 𝑥 ~𝑁 𝜇 , 𝜎 , 𝑌 , … , 𝑌 ~𝑁 𝜇 , 𝜎
❶ 𝝈𝟐𝟏 𝒂𝒏𝒅 𝝈𝟐𝟐 are known
∴ 𝑎 𝑥 ~𝑁 𝑎𝜇 , 𝑎 𝜎
𝑥 𝜎 ⎫
𝑥̅ ~
⏞ 𝑁 𝜇 , ⎪
⎪
𝑛 𝑛 𝜎 𝜎
𝑥̅ 𝑌~𝑁 𝜇 𝜇 ,
⎬ 𝑛 𝑛
𝑌 𝜎 ⎪
𝑌 ~
⏞ 𝑁 𝜇 , ⎪
𝑛 𝑛 ⎭
𝑥̅ 𝑌 𝜇 𝜇
⇒ ~ 𝑁 0,1
𝜎 𝜎
𝑛 𝑛
⎡ ⎤
⎢ ⎥
𝑥̅ 𝑌 𝜇 𝜇
∴1 𝛼 𝑃⎢ 𝑍 𝑍 ⎥
⎢ ⎥
𝜎 𝜎
⎢ ⎥
𝑛 𝑛
⎣ ⎦
𝜎 𝜎 𝜎 𝜎
𝑃 𝑥̅ 𝑌 𝑍 𝜇 𝜇 𝑥̅ 𝑌 𝑍
𝑛 𝑛 𝑛 𝑛
𝜎 𝜎 𝜎 𝜎
∴ 𝑥̅ 𝑌 𝑍 , 𝑥̅ 𝑌 𝑍 𝑖𝑠 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝜇
𝑛 𝑛 𝑛 𝑛
∑ 𝑥 𝑥̅ ∑ 𝑌 𝑌 ∑ 𝑥 𝑥̅ ∑ 𝑌 𝑌
𝑛 1 𝑛 1
𝑛 1 𝑛 1
𝑛 𝑛 2 𝑛 𝑛 2
Then:
(i) ~𝑋 𝑛 𝑛 2
(ii) 𝐸𝑆 𝜎
(iii) ~𝑡 𝑛 𝑛 2
Sol:
𝑛 1 𝑆 𝑛 1 𝑆
∴ ~𝑋 𝑛 𝑛 2
𝜎
ii) 𝐸 𝑛 𝑛 2⇒𝐸 𝜎
𝑋↔𝑆
iii) ∴ 𝑌 𝑋⊥𝑆 𝑁𝑜𝑡𝑒: 𝑌 ⊥ 𝑆 𝑎𝑛𝑑 𝑋 ⊥ 𝑆
𝑌↔𝑆
𝑌 𝑋 𝜇 𝜇
∴ ~ 𝑁 0,1
1 1
𝜎
𝑛 𝑛
,
𝑡
𝑌 𝑋 𝜇 𝜇
1 1
𝜎
𝑛 𝑛
⇒
𝑛 𝑛 2 𝑆 1
𝜎 𝑛 𝑛 2
𝑌 𝑋 𝜇 𝜇
~𝑡 𝑛 𝑛 2
1 1
𝑆
𝑛 𝑛
⎡ ⎤
𝑌 𝑋 𝜇 𝜇
∴1 𝛼 𝑃⎢ 𝑡 𝑛 𝑛 2 𝑡 𝑛 𝑛 2 ⎥
⎢ 1 1 ⎥
𝑆
⎣ 𝑛 𝑛 ⎦
1 1 1 1
⇒ 𝑌 𝑋 𝑡 𝑛 𝑛 2 𝑆 ,𝑌 𝑋 𝑡 𝑛 𝑛 2 𝑆
𝑛 𝑛 𝑛 𝑛
*If 𝜎 𝜎 both unknown then an approximated PQ is:
𝑆 𝑆
𝑌 𝑋 𝜇 𝜇 𝑛 𝑛
~𝑡 𝛾 𝑤ℎ𝑒𝑟𝑒 𝛾
𝑆 𝑆 𝑆 𝑆
𝑛 𝑛 𝑛 𝑛
𝑛 1 𝑛 1
𝝈𝟐𝟐
**Variance:
𝝈𝟐𝟏
❸When 𝝁𝟏 , 𝝁𝟐 are unknown
𝑛 1 𝑆 𝑛 1 𝑆
~𝑋 𝑛 1 , ~𝑋 𝑛 1
𝜎 𝜎
∙
∴ ∙ ~𝐹 𝑛 1, 𝑛 1
∙
𝑆 𝜎
∴1 𝛼 𝑃 𝑓 𝑛 1, 𝑛 1 ∙
𝑆 𝜎
𝑓 𝑛 1, 𝑛 1
****SUMMARY:
I. 𝑥 ~𝐸𝑋𝑃 𝜃
II. 𝑥 ~𝑁 𝜇, 𝜎
A: 𝜇 1) 𝜎 known 𝜎 Ex 4.1
𝑥̅ 𝑍
√𝑛
2) 𝜎 unknown 𝑆 Ex 4.5
𝑥̅ 𝑡 𝑛 1
√𝑛
2) 𝜇 known ∑ 𝑥 𝜇 ∑ 𝑥 𝜇 Ex 4.6
,
𝑋 𝑛 𝑋 𝑛
III. 𝑛 : 𝑥 ~𝑁 𝜇 , 𝜎 , 𝑛 : 𝑌 ~𝑁 𝜇 , 𝜎
𝜎 𝜎
𝑥̅ 𝑌 𝑍
𝑛 𝑛
2) 𝜎 𝜎 𝜎 After Thm 4.2
1 1
unknown 𝑌 𝑋 𝑡 𝑛 𝑛 2 𝑆 , 𝑌
𝑛 𝑛
𝑋
𝑡 𝑛 𝑛
1 1
2 𝑆
𝑛 𝑛
where 𝑆
Approximated CIs
∑
Def 4.4: Consider a r.v. xn indexed by sample size n (e.g., 𝑥 ). We say that 𝑥 converges in
probability to constant C iff for every 𝜀 0:
lim 𝑃 |𝑥 𝑐| 𝜀 1 𝑤𝑟𝑖𝑡𝑡𝑒𝑛 𝑥 → 𝐶
→
lim 𝑃 𝜃 𝜃 𝜀 1
→
or equivalently
lim 𝑃 𝜃 𝜃 𝜀 0
→
Def 4.6: A sequence of r.v. 𝑥 is called mean square error consistent (MSE)
Note: 𝐸 𝑥 𝜏 𝜃
𝑀𝑆𝐸 𝑇 𝐸 𝑇 𝜏 𝜃 𝑉𝑎𝑟 𝑇 𝐸𝑇 𝜏 𝜃
** ∴
𝑀𝑆𝐸 𝑇 → 0 𝑖𝑓 𝑉𝑎𝑟 𝑇 → 0 & 𝑇 𝑖𝑠 𝑎𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
Thm 4.3: A sequence of rv 𝑇 of estimators of 𝜏 𝜃 is MSE consistent iff
1) lim 𝑉𝑎𝑟 𝑇 0 and
→
2) lim 𝐸 𝑇 𝜏 𝜃
→
Thm 4.4: If a sequence 𝑇 is MSE consistent, it is also simply consistent.
Ex 4.7: RS 𝑥 𝑠~ 𝐸𝑋𝑃 𝜃 , 𝑇 ̅
, show that 𝑇 is biased for , but it is simple consistent for .
̅
Sol: ∴ ~𝑋 2𝑛 ⇒ 𝐸 𝑇 𝐸 𝑥̅ 𝐸
2𝑛 1 𝑛 1 1
∙ ∙
𝜃 2 𝑛 1 𝑛 1 𝜃 𝜃
𝑛 1 1
𝑉𝑎𝑟 𝑇 𝐸𝑇 𝐸 𝑇 𝐸𝑇
𝑛 1 𝑛 2𝜃
𝑛 1 1
∴ 𝐴𝑠 𝑛 → ∞ lim 𝐸 𝑇 lim 𝐴𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
→ → 𝑛 1𝜃 𝜃
𝑛 1 1
lim 𝑉𝑎𝑟 𝑇 lim 0
→ → 𝑛 1 𝑛 2𝜃
Sufficient Condition for Consistency of an Unbiased Estimator:
An unbiased estimator 𝜃 of 𝜃 is a consistent estimator for 𝜃 if lim 𝑉𝑎𝑟 𝜃 0
→
Chebyshev’s Inequality: 𝑃 |𝑋 𝜇| 𝜀
lim 𝐸 𝜃 𝜃 0
→
Then 𝜃 is a consistent estimator of 𝜃.
Proof: Using Chebyshev’s inequality, we obtain
𝐸 𝜃 𝜃
𝑃 𝜃 𝜃 𝜀
𝑠
Because
lim 𝐸 𝜃 𝜃 0, 𝑏𝑦 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
→
The right hand side converges to zero. To elaborate, by definition a consistent estimator converges to
the real value as n approaches infinity. Thus,
lim 𝑃 𝜃 𝜃 𝜀 0
→
Consequently, 𝜃 is a consistent estimator of 𝜃
Procedure to test for consistency:
1) Check wheter the estimator 𝜃 is unbiased or not
2) Calculate 𝑉𝑎𝑟 𝜃 and Β 𝜃 , the bias of 𝜃
3) An unbiased estimator is consistent if 𝑉𝑎𝑟 𝜃 → 0 𝑎𝑠 𝑛 → ∞
4) A biased estimator is consistent if both 𝑉𝑎𝑟 𝜃 → 0 𝑎𝑛𝑑 𝐵 𝜃 → 0 𝑎𝑠 𝑛 → ∞
Ex. Let 𝑋 , … , 𝑋 be a random sample from 𝑁 𝜇, 𝜎 population.
1) Show that the sample variance 𝑆 is a consistent estimator for 𝜎
2) Show that the maximum likelihood estimators for 𝜇 and 𝜎 are consistent estimators for 𝜇 and
𝜎
Sol:
a) We have already seen that 𝐸𝑆 𝜎 , and hence, 𝑆 is an unbiased estimator of 𝜎 . Because
the sample is drawn from a normal distribution, we know that has a chi‐square
distribution with (n‐1) dof. And
𝑛 1 𝑆
𝑉𝑎𝑟 2 𝑛 1
𝜎
Thus,
𝑛 1 𝑆 𝑛 1
2 𝑛 1 𝑉𝑎𝑟 𝑉𝑎𝑟 𝑆
𝜎 𝜎
2𝜎
𝑉𝑎𝑟 𝑆 → 0 𝑎𝑠 𝑛 → ∞
𝑛 1
Hence S2 is a consistent estimator of the variance of a normal population.
unbiased estimator of 𝜇, and 𝑉𝑎𝑟 𝑋 → 0 𝑎𝑠 𝑛 → ∞. Therefore, from theorem of a
consistent estimator, 𝑋 us a consistent estimator of 𝜇.
Now we will use the identity
𝐸 𝜃 𝜃 𝑉𝑎𝑟 𝜃 Β 𝜃
To show that the MLE for 𝜎 is biased with
𝑛 1
𝐸 𝜎 𝜎
𝑛
And
𝑛 1 1
Β 𝜎 𝜎 𝜎 𝜎
𝑛 𝑛
Thus, 𝜎 ∑ 𝑋 𝑋 𝑆 . Using part (a), we get
𝑛 1 𝑛 1 2𝜎 2 𝑛 1 𝜎
𝑉𝑎𝑟 𝜎 𝑉𝑎𝑟 𝑆
𝑛 𝑛 𝑛 1 𝑛
Therefore,
𝜎
lim Β 𝜎 lim 0
→ → 𝑛
2 𝑛 1 𝜎
lim 𝑉𝑎𝑟 𝜎 lim 0
→ → 𝑛
By the test for consistency, 𝜎 ∑ 𝑋 𝑋 is a consistent estimator of 𝜎 .
𝐼𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦
lim 𝑔 𝑇 𝑔 lim 𝑇 𝑔 𝜏 𝜃 lim √
Ex: → → → ̅ ̅
→
𝐵𝑢𝑡 𝐸 𝑔 𝑇 𝑔 𝐸𝑇
𝐸 ̅ ̅
Difference between unbiasedness and consistency****:
∑
Ex: 𝑇 estimator for E[X], show 𝑇 → 𝐸 𝑋
∑
1) lim 𝐸 𝑇 lim 𝐸 𝐸𝑋
→ →
∑ ∑
2) lim 𝑉𝑎𝑟 𝑇 lim 𝑉𝑎𝑟 ∙ lim lim 0
→ → → →
Σ𝑥 𝑛 ∑𝑥 𝑛
However, E 𝑇 𝐸 𝐸 𝐸 𝑋 → 𝑏𝑖𝑎𝑠𝑒𝑑
𝑛 1 𝑛 1 𝑛 𝑛 1
𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 ↛ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠
∑ ∑ ∑
Ex. 𝑇 𝑥 ∶𝐸 𝑇 𝐸𝑥 𝐸
1 1 𝑛 1 𝐸𝑋
𝐸𝑋
2 2 𝑛 1
𝐸𝑋
1) Must be asymptotically unbiased
∑ ∑
2) lim 𝑉𝑎𝑟 𝑇 lim 𝑉𝑎𝑟 𝑥 𝑉𝑎𝑟 lim 𝑉𝑎𝑟 𝑥
→ → →
1 1 𝜎
lim 𝜎
→ 4 4𝑛 1
1
𝜎 0
4
⇒ Not Consistent
𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡𝑐𝑦 ↚ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠
∑ ∑ ∑
Ex. 𝑇 ∶ lim 𝑉𝑎𝑟 𝑇 lim 𝑉𝑎𝑟 lim lim 0
→ → → →
1∑ 𝑥 1 ∑𝑥 1
𝐸𝑇 𝐸 𝐸 𝐸𝑋 𝐸𝑋
2 𝑛 2 𝑛 2
→
𝑉𝑎𝑟 𝑇 ⎯⎯ 0 ↛ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠
lim 𝐺 𝑦 𝐺 𝑦 𝑁𝑜𝑡𝑒𝑑: 𝑌 → 𝑌
→
1, 𝑦 1
Ex 4.8: RS 𝑥 , … , 𝑥 ~𝑈𝑁𝐼𝐹 0,1 , if 𝑌 𝑋 : , show 𝑌 → 𝑌, where 𝑌~𝐺 𝑦
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1, 1 𝑦
Sol: ∴ 𝐺 𝑦 𝑦 , 0 𝑦 1
0, 𝑦 0
→ 1, 1 𝑦
⎯⎯⎯ 𝐺 𝑦
0, 𝑦 1
1) 𝑋 𝑌 →𝐶 𝑌
2) 𝑋 𝑌 → 𝐶𝑌
3) → 𝐶 0
Thm 4.8: CLT: RS xi’s from any distribution with mean 𝜇 and variance 𝜎 ∞,
∑ 𝑥
𝜇
𝑛 → 𝑁 0,1 , 𝑛 → ∞
𝜎
𝑛
∑
Sol: We know 𝑝̂ , 𝑉𝑎𝑟 𝑋 𝑝 1 𝑝
𝑋 𝐸𝑋 𝑝̂ 𝑝
⇒ → 𝑁 0,1 𝑏𝑦 𝐶𝐿𝑇
𝑉𝑎𝑟 𝑋 𝑝 1 𝑝
𝑛 𝑛
Since 𝑝̂ → 𝑝 ⇒ 𝑝̂ 1 𝑝̂ → 𝑝 1 𝑝
𝑝̂ 1 𝑝̂ 𝑝̂ 1 𝑝̂
⇒ →1 ⇒ → 1
𝑝 1 𝑝 𝑝 1 𝑝
𝑝̂ 𝑝
𝑝 1 𝑝
𝑛 𝑁 0,1
→ 𝑁 0,1 𝑁𝑜𝑡𝑒: 𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟 → 𝑁 0,1 , 𝑑𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 → 1
1
𝑝̂ 1 𝑝̂
𝑝 1 𝑝
𝑝̂ 𝑝
→ 𝑁 0,1 ~𝐴𝑁 0,1
𝑝̂ 1 𝑝̂
𝑛
Proof: ∴ 𝑌 ∑ 𝑥, 𝑥 ~𝑋 1 ⊥
∑ 𝑥 𝑌 𝑉𝑎𝑟 𝑥 2
∴𝑋 and 𝐸 𝑋 1, 𝑉𝑎𝑟 𝑋
𝛾 𝛾 𝛾 𝛾
𝑌
𝑋 𝐸𝑋 1 𝑌 𝛾
𝛾
By CLT: #
𝑉𝑎𝑟 𝑋 2 2𝛾
𝛾 𝛾
Thm 4.10:
and if 𝑔 𝑥 : 𝑔 𝜇 0
𝑔 𝑥 𝑔 𝜇
|𝜎𝑔 𝜇 |
𝑡ℎ𝑒𝑛, ~𝐴𝑁 0,1
√𝑛
𝑆 →𝜎 𝑜𝑟 𝑆 → 𝜎
Proof: (Use Chebychev Inequality)
Ex. 4.10: RS 𝑋 𝑠~𝑁 𝜇, 𝜎 . Find the limiting distribution of Sn2
Sol: We know 𝑉 ~𝑋 𝑛 1
𝑉 𝑛 1
By Thm 4.9, → 𝑍~𝑁 0,1
2 𝑛 1
𝑛 1 𝑆
𝑛 1 1 𝑆 𝜎
𝜎 𝑠𝑖𝑚𝑝𝑙𝑖𝑓𝑦 √𝑛
↪
2 𝑛 1 𝜎 √2
𝑆 𝜎
~ 𝐴𝑁 0,1
2𝜎
𝑛 1
2𝜎
∴ 𝑆 →𝑁 𝜎 , #
𝑛 1
Sol: ⁄√ ⁄√
∙ → 𝑁 0,1 ∙ 1 𝑁 0,1
𝜎
𝑆 →𝜎 ⇒ →1 #
𝑆
Claim:
Let 𝑥 , … , 𝑥 be a random sample from pdf 𝑓 𝑥; 𝜃 , 𝜃 ∈ Ω. Then the MLE estimator is asymptotically
normal and asymptotically efficient ie:
⎛ 1 ⎞
𝜃 → 𝑁 ⎜𝜃, ⎟
𝜕
𝑛𝐸 ln 𝑓 𝑥; 𝜃
𝜕𝜃
⎝ ⎠
If the following regularity conditions are satisfied,
0) The pdfs are distinct, i.e., 𝜃 𝜃 ⇒ 𝑓 𝑥 ;𝜃 𝑓 𝑥 ; 𝜃 .
1) The pdfs have common support for all 𝜃
2) The point 𝜃 is an interior point in Ω
3) The integral 𝑓 𝑥; 𝜃 𝑑𝑥 can be differentiated twice under the integral sign as a function of 𝜃
4) The pdf 𝑓 𝑥; 𝜃 is three times differentiable as a function of 𝜃. Further, ∀𝜃 ∈ Ω, ∃ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝐶
𝑎𝑛𝑑 𝑀 𝑥 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡
IV: Approximated PQ (assume n large)
1) RS 𝑥 𝑠 from any distribution with mean 𝜇 and 𝜎 both unknown
̅
Approximated PQ for 𝜇: ⁄√
~𝑁 0,1 ← 𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛
𝑥̅ 𝜇
~𝐴𝑁 0,1
𝑆⁄√𝑛
2) 𝑥 , … , 𝑥 ⊥ 𝑌 , … , 𝑌 from any distributions with 𝜇 , 𝜎 and 𝜇 , 𝜎 → all unknown
Approximated PQ for 𝜇 𝜇 :
𝑌 𝑋 𝜇 𝜇
~𝐴𝑁 0,1
𝑆 𝑆
𝑛 𝑛
3) 𝑥 , … , 𝑥 ~𝐵𝐼𝑁 1, 𝑝 :
𝑋 𝑝
Approximated PQ for 𝑝: ~𝐴𝑁 0,1
𝑝 1 𝑝
𝑛
→ 𝑋 𝑝
~𝐴𝑁 0,1
𝑋 1 𝑋
𝑛
4) 𝑥 , … , 𝑥 , ⊥ 𝑌 , … , 𝑌 from 𝐵𝐼𝑁 1, 𝑝 and 𝐵𝐼𝑁 1, 𝑝 respectively.
Approximated PQ for 𝑝 𝑝 :
𝜎
𝑋 ~𝐴𝑁 𝑝 ,
𝑛
𝑌 𝑋 𝑝 𝑝 𝑌 𝑋 𝑝 𝑝 𝜎
~𝐴𝑁 0,1 ⇒ 𝑋 ~𝐴𝑁 𝑝 ,
𝑛
𝜎 𝜎 𝑌 1 𝑌 𝑋 1 𝑋
𝑛 𝑛 𝑛 𝑛 𝜎 𝜎
𝑋 𝑋 ~𝐴𝑁 𝑝 𝑝 ,
𝑛 𝑛
5) 𝑥 , … , 𝑥 from 𝑃𝑂𝐼 𝜇
Approximated PQ for 𝜇: See Quiz
𝑋 𝜇
~𝐴𝑁 0,1
𝜇⁄𝑛
𝑋 𝜇
~𝐴𝑁 0,1
𝑋⁄𝑛
****Summary:****
𝑠𝑡𝑟𝑜𝑛𝑔𝑒𝑟
𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑖𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 ⇒ 𝑋 → 𝑋
⎯⎯⎯ 𝑆𝑖𝑚𝑝𝑙𝑦 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡
1) MSE consistent ⇒𝑋 →𝜇
1 𝐴𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
⇒ 2 lim 𝑉𝑎𝑟 𝑋 0
→
𝑋 𝑌 →𝜇 𝑌
𝐼𝑓 𝑥 → 𝜇 𝑐𝑜𝑛𝑠𝑡 , 𝑡ℎ𝑒𝑛 𝑔 𝑥 → 𝑔 𝜇 , 𝑔 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
2) 𝑋 ∙ 𝑌 → 𝜇𝑌
𝐼𝑓 𝑌 → 𝑌 𝑟. 𝑣. , 𝑡ℎ𝑒𝑛 𝑔 𝑌 → 𝑔 𝑌 , 𝑔 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
→
𝑋 ~𝐴𝑁 𝜇,
3) CLT:
𝑆 →𝜎
5) ~𝐴𝑁 0,1
√
a) xi's are Normal and 𝜎 known 𝜎 𝜎
𝑋 𝑍 ,𝑋 𝑍
√𝑛 √𝑛
b) xi's are Normal but 𝜎 unknown
𝑆 𝑆
𝑋 𝑡 , 𝑋 𝑡
c) xi's are NOT Normal but 𝑛 large √𝑛 √𝑛
𝑆 𝑆
𝑋 𝑍 ,𝑋 𝑍
√𝑛 √𝑛
Ex. 4.11: In order to compare the speed of two types of CPUs: Intel Pentium 4(T) vs AMD Athlon XP (II),
18 random samples of Pentium 4 were taken and the average processing time is 𝑋
1198 ,S 3.2 . 12 random samples of Athlon XP were taken and the average processing
time is 𝑌 1202 , S 4.3 . Assume both samples are Normal with equal variances.
What is the 95% CI for 𝜇 𝜇 ?
1 1 1 1
𝑦 𝑥̅ 𝑡 n n 2 S , 𝑦 𝑥̅ 𝑡 n n 2 S
𝑛 𝑛 𝑛 𝑛
n 1 S n 1 S
where S
n n 2
⇒𝑆 3.67
Ex. 4.12: In Ex 4.11, suppose now we take n1=21 samples and obtain S1=3.2, and n2=16 and S2=4.3.
Assume the samples are independent and follow𝑁 𝜇 , 𝜎 𝑎𝑛𝑑 𝑁 𝜇 , 𝜎 , 𝜇 , 𝜎 , 𝑖
1,2 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛. What is 90% CI for 𝜎 /𝜎 ?
Sol: This is Case 3.B.1: 𝑓 𝑛 1, 𝑛 1 , 𝑓 𝑛 1, 𝑛 1
𝑓 𝑛 1, 𝑛 1 𝑓. 20,15 2.33
1 1
𝑓 𝑛 1, 𝑛 1 𝑓. 20,15 0.45
𝑓. 15,20 2.20
𝜎
∴ 90% 𝐶𝐼 𝑓𝑜𝑟 𝑖𝑠 0.813, 4.207 #
𝜎
Ex 4.13: Assume that in HSBC, the amount of deposit that a customer made follows 𝑁 𝜇, 𝜎 , where
𝜎 1000 and 𝜇 is unknown. Suppose we want to estimate 𝜇 such that the absolute error(precision)
50 dollars under a 95% confidence level. What is the sample size?
⇒ 𝑃 50 𝑥̅ 𝜇 50 0.95
50 𝑥̅ 𝜇 50
𝑃 𝜎 𝜎 𝜎 0.95
√𝑛 √𝑛 √𝑛
50
⇒𝜎 𝑍 𝑍 . 𝑍 . 1.96
√𝑛
1.96𝜎
⇒𝑛 1536.64 ⎯⎯⎯⎯⎯ 𝑛 1537 #
50
Ex 4.14: To buy a 30‐sec commercial break during the telecast of Super Bowl XXIX(the 29th Super Bowl,
the championship game of NFL) cost approximately $1,000,000. Potential sponsors want to find out how
many people might be watching. In a survey of 1015 potential viewers, 734 said they will watch more
than a quarter of the advertisements aired during the game. Want 90% CI for that probability.
Sol: Case IV.3
Assuming n=1015 is large, p =probability of a potential viewer watches the
commercial
𝑋 1 𝑋 𝑋 1 𝑋
𝑃 𝑋 𝑍 𝑝 𝑋 𝑍
𝑛 𝑛
734
⇒ 𝑋 0.72, 𝑍 𝑍 . 1.64
1015
**** Question: What is n=? if we want |𝑋 𝑝| 𝑑 under 95% confidence?
Sol: 1 𝛼 𝑃 |𝑋 𝑝| 𝑑 𝑃 𝑑 𝑋 𝑝 𝑑
𝑃
When n is large, 𝑍
𝑍
⇒ 𝑛 𝑝 1 𝑝 ∙
𝑑
1 𝑍
𝑛∗
4 𝑑
.
If 𝑑 0.01, 𝛼 5%, 𝑡ℎ𝑒𝑛: 𝑛∗ 9604 #
.
Ex 4.15: A manufacter inspects a RS of size 200 items from a process and 20% are defective. After an
improvement process, another RS of size 200 is taken and 12% are defective. Does the improvement
process help under a 90% confidence level?
⎡ ⎤
⎢ ⎥
𝑌 𝑋 𝑝 𝑝
1 𝛼 𝑃⎢ 𝑍 𝑍 ⎥
⎢ ⎥
𝑌 1 𝑌 𝑋 1 𝑋
⎢ ⎥
𝑛 𝑛
⎣ ⎦
𝑌 1 𝑌 𝑋 1 𝑋 𝑌 1 𝑌 𝑋 1 𝑋
𝑃 𝑌 𝑋 𝑍 𝑝 𝑝 𝑌 𝑋 𝑍
𝑛 𝑛 𝑛 𝑛
𝐷 0.02 2 1
0.166 #
𝑊𝑖𝑑𝑡ℎ 0.12 12 6
Paired‐Sample Method:
For example, when measuring the effectiveness of diet plan, Let Xi andYi be the weight of the ith
individual before and after the implementation of the diet plan, respectively. Then, 𝑥 , … , 𝑥 would be
independent (same for 𝑌 , … , 𝑌 ) But the pair 𝑋 , 𝑌 would be dependent.
A PQ for 𝜇 𝜇 𝜇 : 𝑇 ,
√
∑ 𝑜 ∑ 𝐷 𝐷
𝑤ℎ𝑒𝑟𝑒 𝐷 , 𝑆
𝑛 𝑛 1
𝑆 𝑆
𝑑̅ 𝑡 𝑛 1 , 𝑑̅ 𝑡 𝑛 1 #
√𝑛 √𝑛
Ex 4.16: A RS of 41 marginally overweight non‐smoking men was taken and their blood pressure was
measured. After a diet plan has been implementedd for 3 months, their blood pressure was measured
again. Want 99% CI for 𝜇 𝜇
Sol: 𝛼 0.01,
𝑡 𝑛 1 𝑡 . 40 2.704
𝑑̅ 9, 𝑆 2.6
∴ 99% 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝜇
𝑆
𝑑̅ 𝑡 𝑛 1 , 𝑑̅
√𝑛
𝑆
𝑡 𝑛 1
√𝑛
⇒ 7.9, 10.1
#
1 General Method
Other Methods:
2 CDF Approach
There are other more advanced topics for CI estimation:
1) Conservative CIs
2) Monte Carlo Simulation
3) Bayesian Intervales ⇒ "𝐺𝑒𝑡𝑡𝑖𝑛𝑔 𝑆𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 𝑓𝑟𝑜𝑚 𝑛𝑜𝑡ℎ𝑖𝑛𝑔"
4) Bootstrap CI’s
5) CIs for Stochastic processes(nonstationary processes)
(CI is a function of time t.)
6) Prediction Intervals
7) Tolerance Intervals
𝑛 20: 131.7, 182.7, 73.3, 10.7, 150.4, 42.3, 22.2, 17.9, 264.0, 154.4, 4.3, 215.6, 61.9,
Find the 100 1 𝛼 % Bootstrap percentile CI for 𝛽.
Sol:
1) Resample with replacement 3000 times:
(m=# of Bootstrap samples, b=sample size of each Bootstrap sample)
𝑥∗ 𝑥 ∗, , 𝑥 ∗, , … , 𝑥 ∗, , 𝑥 ∗ 𝑥 ∗, , 𝑥 ∗, , … , 𝑥 ∗ , , … , 𝑥 ∗ 𝑥∗ , , … , 𝑥∗ ,
𝑒𝑔: 𝑥 ∗ : 4.3, 4.3, 4.3, 10.8, 10.8, 10.8, 10.8, 17.9, 22.5, 42.3, 48.8, 48.8, 85.9, 131.7, 131.7,
150.4, 154.4, 154.4, 264.0, 265.6
2) Compute 𝜃 ∗ 𝑥̅ ∗ , 𝑗 1, … ,3000 , 𝑎𝑛𝑑 𝑜𝑟𝑑𝑒𝑟 𝑡ℎ𝑒𝑚: 𝜃 ∗ 𝜃∗ ⋯ 𝜃∗
3) The 100 1 𝛼 % percentile Bootstrap CI for 𝜃 is:
𝜃∗ , 𝜃∗
∗ ∗
Chapter 5: Hypothesis Tests
STEPS for HT:
𝑪𝒐𝒎𝒑𝒖𝒕𝒂𝒕𝒊𝒐𝒏 𝑫𝒆𝒄𝒊𝒔𝒊𝒐𝒏
𝑷𝒓𝒐𝒃𝒍𝒆𝒎 𝑫𝒆𝒇𝒊𝒏𝒆 𝑬𝒙𝒑𝒆𝒓𝒊𝒎𝒆𝒏𝒕
→ → → → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑏𝑎𝑠𝑒𝑑
𝑀𝑎𝑘𝑒 𝑎 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝐻 𝑣𝑠 𝐻 𝑐𝑜𝑙𝑙𝑒𝑐𝑡 𝑑𝑎𝑡𝑎 𝑥 , … , 𝑥 𝑋, 𝑆
𝑜𝑛 𝛼% 𝑣𝑎𝑙𝑢𝑒
Ex 5.1: An automobile company is testing whether a certain additive can help increase gas mileage.
Without the additive, it takes on average 25.0 mpg with 𝜎 2.4 𝑚𝑝𝑔 (S is known for simplicity) By using
the additive, will that increase the gas mileage? Assume 𝑥 𝑠 ~ 𝑁 𝜇, 2.4
Sol:
1) Problem: Effective or Not?
𝐻 ∶ 𝜇 25.0 𝑁𝑜𝑡 𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑁𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
2) Define:
𝐻 : 𝜇 25.0 𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒
3) Experiment:
Using the additive, 30 cars are being tested on a road trip from Boston to LA: giving:
𝑥 ,𝑥 ,…,𝑥
𝑥 ∗ 25
𝑃 𝑍
2.4
√30
𝑥̅ ∗ 25
⇒ 𝑍 . 1.64
2.4⁄√30
⇒ 𝑥̅ ∗ 25.718
∴ 𝑥̅ ∗ 26.3 25.718 𝑖𝑠 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 #
𝒔𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒆𝒏𝒕
↛ 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒
∗ 𝐻 𝑖𝑠 𝑁𝑂𝑇 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑 ⇒ Fail to reject 𝐻
⇒ 𝑁𝑜𝑡 𝑒𝑛𝑜𝑢𝑔ℎ 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝑡𝑜 𝑜𝑣𝑒𝑟𝑡𝑢𝑟𝑛 𝑡ℎ𝑒 𝑝𝑟𝑒𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛
Def 5.1:
1) Any function of the ovserved data whose numerical value dictates whether H0 is accepted or
rejected is called a test statistic eg. 𝑥̅
2) The set of values for the test statistic that results in the null hypothesis
being rejected is called the critical region, denoted by C
3) The particular point in C that separates the rejection region from the
acceptance region is called the critical value. eg. 𝑥 ∗ 25.718
Def 5.2: The probability that the test statistic lies in the critical region when 𝐻 is true is called the level
of significance (the size of the test), denoted by 𝛼.
Def 5.2:
1) TYPE I Error: Reject a true H0
2) TYPE II Error: Fail to reject a false H0
*To compute P[T II], we need to know the decision rule first.
𝐻 ∶ 𝜇 𝜇 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑎𝑡 𝛼
Eg., Consider: vs.
𝐻 ∶ 𝜇 𝜇 𝜇
̅ ∗
Test: 𝛼 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 | 𝐻 𝑃𝑋 𝑥 ∗ |𝜇 𝑥 𝑠~ 𝑁𝑜𝑟𝑚𝑎𝑙, ⊥ 𝑃 𝜇
√ √
𝑃𝑍 𝑍
𝑃 𝑇𝐼𝐼 𝑃 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 |𝐻
𝑥̅ 𝜇
𝑃 𝜎 𝑍 𝜇
√𝑛
𝑥̅ 𝜇 𝜇 𝜇
𝑃 𝜎 𝜎 𝑍 |𝜇
√𝑛 √𝑛
𝑃 𝑍 𝜇
√ √
Φ 𝑍
√
In Ex 5.1, if 𝐻 ∶ 𝜇 25.75, then
.
𝑃 𝑇 𝐼𝐼 Φ 𝑍 . . 0.4721 Not a good decision
√
**Simple and Composite Hypotheses:
Def 5.4: If a hypothesis completely specifies 𝑓 𝑥; 𝜇 , then it is called a simple hypothesis, otherwise it is
a composite hypothesis.
Eg,
𝐻 ∶ 𝜇 25 𝑆𝑖𝑚𝑝𝑙𝑒
𝐻 ∶ 𝜇 25.75 𝑆𝑖𝑚𝑝𝑙𝑒
𝐻 ∶ 𝜇 25 𝑆𝑖𝑚𝑝𝑙𝑒
𝐻 ∶ 𝜇 25 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑒
𝐻 ∶ 𝜇 25 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑒
𝐻 ∶ 𝜇 25 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑒
𝐻 ∶ 𝜇 25 𝑆𝑖𝑚𝑝𝑙𝑒
𝐻 ∶ 𝜇 25 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑒
**One‐side and Two‐side Tests:
𝐻 ∶ 𝜇 25 𝑢𝑝𝑝𝑒𝑟 𝑜𝑛𝑒 𝑠𝑖𝑑𝑒
𝐻 ∶ 𝜇 25 𝑡𝑒𝑠𝑡
𝐻 ∶ 𝜇 25 𝑙𝑜𝑤𝑒𝑟 𝑜𝑛𝑒 𝑠𝑖𝑑𝑒
𝐻 ∶ 𝜇 25 𝑡𝑒𝑠𝑡
𝐻 ∶ 𝜇 25 𝑡𝑤𝑜 𝑠𝑖𝑑𝑒
𝐻 ∶ 𝜇 25 𝑡𝑒𝑠𝑡
**For a
composite
null hypothesis, the size of the test is bounded by 𝜶.
∞, 𝜇 ←
Ω 𝐻 ∶ 𝜇 𝜇 𝑥̅ 𝜇
𝑣𝑠 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑎𝑡 𝑙𝑒𝑣𝑒𝑙 𝛼 𝑖𝑒. , 𝑍
Ω Ω 𝜇 ,∞ ← 𝐻 ∶ 𝜇 𝜇 𝜎 ⁄ √𝑛
1) We know that when 𝜇 𝜇 , 𝑃 𝑇 𝐼 𝛼
2) Now, need to show that 𝑃 𝑇 𝐼 𝛼 when 𝜇 𝜇
𝜇 𝜇 ̅
When ∶ 𝑃 𝑇𝐼 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 |𝐻 𝑃 ⁄ 𝑍 𝜇
𝜇 √
𝑥̅ 𝜇 𝜇 𝜇
𝑃 𝑍 𝜇
𝜎⁄√𝑛 𝜎⁄√𝑛