0% found this document useful (0 votes)
50 views17 pages

Book Down

The document provides an overview of random sampling and key concepts in classical inference, including: 1) Random samples are independent and identically distributed random variables drawn from the same population distribution. The joint distribution of a random sample can be written as the product of the marginal distributions. 2) A statistic is a function of the random sample that provides a summary. Common statistics include the sample mean, variance, and standard deviation. 3) The sample mean and variance are unbiased estimators of the population mean and variance, respectively. The sampling distribution of the sample mean and variance are also provided. 4) Properties of the sample mean, variance, and how they relate to the population moments are presented, along
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views17 pages

Book Down

The document provides an overview of random sampling and key concepts in classical inference, including: 1) Random samples are independent and identically distributed random variables drawn from the same population distribution. The joint distribution of a random sample can be written as the product of the marginal distributions. 2) A statistic is a function of the random sample that provides a summary. Common statistics include the sample mean, variance, and standard deviation. 3) The sample mean and variance are unbiased estimators of the population mean and variance, respectively. The sampling distribution of the sample mean and variance are also provided. 4) Properties of the sample mean, variance, and how they relate to the population moments are presented, along
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Jizhou Kang

STAT 205B: Classical


Inference
To Shao Yong (邵雍),
for sharing a secret joy with simple words;

月到天⼼处,风来⽔面时。
⼀般清意味,料得少⼈知。

and

To Hongzhi Zhengjue (宏智禅师),


for sharing the peace of an ending life with simple words.

梦幻空华,六⼗七年;
白鸟淹没,秋⽔连天。
Contents

List of Tables v

List of Figures vii

Preface ix

Random Samples, Special Distribution (Lecture on


01/07/2020) xi

iii
List of Tables

v
List of Figures

vii
Preface

This is my E-version notes of the classical inference class in UCSC


by Prof. Bruno Sanso, Winter 2020. This notes will mainly contain
lecture notes, relevant extra materials (proofs, examples, etc.), as
well as solution to selected problems, in my style. The notes will be
ordered by time. The goal is to summarize all relevant materials
and make them easily accessible in future.
The textbook that we used is Casella & Berger’s famous book:
Classical Inference, Second Edition. Most of the materials in this
notes is from the textbook, although extending matrials will be
added with reference. For more information about this class, one
may refer to the UCSC course website1 .
Since we are a Bayesian statistics department, classical inference
is not treated that serious here. A lot of technical proofs are not
required, which I find it’s interesting to know. Therefore, I will try
to include other relevant materials as much as possible, and use
different colors for separating levels of importance.
[Special Thanks…]
[Otherthing to be added later]
Jizhou Kang
01/11/2020

1
https://stat205b-winter20-01.courses.soe.ucsc.edu/

ix
0
Random Samples, Special Distribution
(Lecture on 01/07/2020)

Often, the data collected in an experiment consist of several obser-


vations on a variable of interest. Random sampling is the model
for data collection that is often used to describe this situation.
Definition 0.1 (Random Sample). The random variables
𝑋1 , ⋯ , 𝑋𝑛 are called a random sample of size n from the popu-
lation 𝑓(𝑥) if 𝑋1 , ⋯ , 𝑋𝑛 are mutually independent random vari-
ables and the marginal pdf or pmf f each 𝑋𝑖 is the same function
𝑓(𝑥). Alterantively, 𝑋1 , ⋯ , 𝑋𝑛 are called independent and identi-
cally distributed random variables with pdf or pmf 𝑓(𝑥). This is
commonly abbreviated to iid random variables.

The joint pdf or pmf of 𝑋1 , ⋯ , 𝑋𝑛 is given by


𝑛
𝑓(𝑋1 , ⋯ , 𝑋𝑛 ) = 𝑓(𝑥1 )𝑓(𝑥2 ) ⋯ 𝑓(𝑥𝑛 ) = ∏ 𝑓(𝑥𝑖 ) (0.1)
𝑖=1

Since 𝑋1 , ⋯ , 𝑋𝑛 are identically distributed, all the margianl den-


sities 𝑓(𝑥) are the same function. Furthermore, if the population
pdf or pmf is a member of a parametric family, with pdf or pmf
given by 𝑓(𝑥|𝜃), then the joint pdf or pmf is
𝑛
𝑓(𝑋1 , ⋯ , 𝑋𝑛 |𝜃) = ∏ 𝑓(𝑥𝑖 |𝜃) (0.2)
𝑖=1

Random sample is also refered to infinite population sampling.

xi
xii Random Samples, Special Distribution (Lecture on 01/07/2020)

If you sample 𝑋1 , ⋯ , 𝑋𝑛 sequentially, the independent assump-


tion indicates that the observed result 𝑥1 of 𝑋1 will not in-
fluence the observed result 𝑥2 of 𝑋2 . “Removing” 𝑥1 from a
infinite population does not change the population.

When sampling is from a finite population, it might be or might


not be relevant to random sample. If it is sampled with re-
placement, then it is random sample. If it is sampled with-
out replacement, then it is not random sample because it vi-
olates the independent assumption in definition 0.1. Beacuse
𝑝(𝑋2 = 𝑦|𝑋1 = 𝑦) = 0 while 𝑝(𝑋2 = 𝑦|𝑋1 = 𝑥) ≠ 0 which
means the population distribution of 𝑋2 dose depend on the
value of 𝑋1 . However, 𝑋1 , ⋯ , 𝑋𝑛 are identically distributed,
which can be proved by law of total probability. This kind of
sampling is sometimes called simple random sampling. If the
population size N is larger compared to sample size n, then
samples are nearly independent and probability can be approx-
imated by assuming independence.

When a sample 𝑋1 , ⋯ , 𝑋𝑛 is drawn, some summary of the val-


ues is usually computed. Any well-defined summary may be ex-
pressed mathematically as a function 𝑇 (𝑋1 , ⋯ , 𝑋𝑛 ) whose domain
includes the sample space of the random vector (𝑋1 , ⋯ , 𝑋𝑛 ) . The
function T may be real-valued or vector-valued; thus the summary
is a random variable (or vector), 𝑌 = 𝑇 (𝑋1 , ⋯ , 𝑋𝑛 ).
Definition 0.2 (Statistic). Let 𝑋1 , ⋯ , 𝑋𝑛 be a random sample of
size n from the population and let 𝑇 (𝑋1 , ⋯ , 𝑋𝑛 ) be a real-valued
or vector-valued function whose domain includes the sample space
of (𝑋1 , ⋯ , 𝑋𝑛 ). Then the random variable or random vector 𝑌 =
𝑇 (𝑋1 , ⋯ , 𝑋𝑛 ) is called a statistic. The probability distribution of
a statistic Y is called the sampling distribution of Y.

The only restriction for statistic is that it cannot be a


function of parameters. Sample mean, variance and standard
xiii

deviation are often used and provide good summaries of the sam-
ple.
Definition 0.3 (Sample Mean). The sample
𝑛 mean is defined as
𝑋 + ⋯ + 𝑋 1
𝑋̄ = 1 𝑛
= ∑ 𝑋𝑖 (0.3)
𝑛 𝑛 𝑖=1

Definition 0.4 (Sameple Variance and Standard Deviation). The


sample variance is defined as 𝑛
1
2
𝑆 = ∑(𝑋 − 𝑋)̄ 2 (0.4)
𝑛 − 1 𝑖=1 𝑖

And sample standard deviation is√defined as


𝑆 = 𝑆2 (0.5)

Sample mean minimizes


𝑛 the total 𝑛quadratic difference, i.e.
𝑚𝑖𝑛𝑎 ∑(𝑥𝑖 − 𝑎)2 = ∑(𝑥𝑖 − 𝑥)̄ 2 (0.6)
𝑖=1 𝑖=1

(0.6) can be easily proved by the classic trick of adding 𝑥̄ and sub-
stracting 𝑥̄ inside the brackets. Then apply another classic charac-
teristic of sample mean: 𝑛
∑(𝑥𝑖 − 𝑥)̄ = 0 (0.7)
𝑖=1

Another useful property of sample𝑛 mean and variance is:


(𝑛 − 1)𝑠2 = ∑ 𝑥2𝑖 − 𝑛𝑥2̄ (0.8)
𝑖=1

Lemma 0.1. 𝑋1 , ⋯ , 𝑋𝑛 be a random sample form a population,


𝑔(𝑥) be a function such
𝑛 that 𝐸(𝑔(𝑋1 )) and 𝑉 𝑎𝑟(𝑔(𝑋1 )) exist. Then
𝐸(∑ 𝑔(𝑋𝑖 )) = 𝑛(𝐸(𝑔(𝑋1 )))
𝑖=1
𝑛 (0.9)
𝑉 𝑎𝑟(∑ 𝑔(𝑋𝑖 )) = 𝑛(𝑉 𝑎𝑟(𝑔(𝑋1 )))
𝑖=1

Proof. The first part of (0.9) can easily be shown by the linear
xiv Random Samples, Special Distribution (Lecture on 01/07/2020)

property of expectation.
𝑛 To prove
𝑛 the second
𝑛 part, note that
𝑉 𝑎𝑟(∑ 𝑔(𝑋𝑖 )) = 𝐸[∑ 𝑔(𝑋𝑖 ) − 𝐸(∑ 𝑔(𝑋𝑖 )]2
𝑖=1 𝑖=1 𝑖=1
𝑛 (0.10)
= 𝐸[∑(𝑔(𝑋𝑖 ) − 𝐸𝑔(𝑋𝑖 ))]2
𝑖=1

Notice in (0.10) there are n terms of (𝑔(𝑋𝑖 )−𝐸𝑔(𝑋𝑖 ))2 , 𝑖 = 1, ⋯ , 𝑛,


and each of them is just 𝑉 𝑎𝑟(𝑔(𝑋1 )). The remaining terms are all
of the form (𝑔(𝑋𝑖 ) − 𝐸𝑔(𝑋𝑖 ))(𝑔(𝑋𝑗 ) − 𝐸𝑔(𝑋𝑗 )), 𝑖 ≠ 𝑗, which is
𝐶𝑜𝑣(𝑔(𝑋𝑖 ), 𝑔(𝑋𝑗 )) = 0.
Theorem 0.1. 𝑋1 , ⋯ , 𝑋𝑛 random sample from a population with
mean 𝜇 and variance 𝜎2 < ∞. Then
• a. 𝐸 𝑋̄ = 𝜇
𝜎2
• b. 𝑉 𝑎𝑟(𝑋)̄ = 𝑛

• c. 𝐸𝑆 2 = 𝜎2
(Sample mean and variance are unbiased estimator!)

Proof. For (a), let 𝑔(𝑋𝑖 ) = 𝑋𝑖 /𝑛, so 𝐸𝑔(𝑋𝑖 ) = 𝜇/𝑛, then apply
Lemma 0.1.
𝜎2
For (b), 𝑉 𝑎𝑟(𝑔(𝑋𝑖 )) = 𝜎2 /𝑛2 , then by Lemma 0.1, 𝑉 𝑎𝑟(𝑋)̄ = 𝑛.

Finally for (c), we have 𝑛


1
𝐸𝑆 2 = 𝐸( [∑ 𝑋 2 − 𝑛𝑋̄ 2 ])
𝑛 − 1 𝑖=1 𝑖
1 (0.11)
= (𝑛𝐸𝑋12 − 𝑛𝐸 𝑋̄ 2 )
𝑛−1
1 𝜎2
= (𝑛(𝜎2 + 𝜇2 ) − 𝑛( + 𝜇2 )) = 𝜎2
𝑛−1 𝑛
where the last part use the fact that 𝐸𝑌 2 = 𝑉 𝑎𝑟(𝑌 ) + (𝐸𝑌 )2 for
any random variable Y.
xv

From (a) and (c) of Theorem 0.1, sample mean and sample vari-
ance is unbiased estimator of population mean and variance.

Theorem 0.2. Let 𝑋1 , ⋯ , 𝑋𝑛 be a random sample from a popu-


lation with pdf 𝑓𝑋 (𝑥), and 𝑋̄ denote the sample mean, then depite
whether the mgf of X exists,
𝑓𝑋̄ (𝑥) = 𝑛𝑓𝑋1 +⋯+𝑋𝑛 (𝑛𝑥) (0.12)

Futhermore, if mgf of X does exist, denoted as 𝑀𝑋 (𝑡), then


𝑡
𝑀𝑋̄ (𝑡) = [𝑀𝑋 ( )]𝑛 (0.13)
𝑛

(This theroem combines Exercise 5.5 and Theorem 5.2.7 on Casella


and Berger (2002))

Proof. Let 𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 , then 𝑌 = 𝑛𝑋,̄ and


𝑑𝑌
𝑓𝑋 (𝑥) = 𝑓𝑌 (𝑛𝑥) ∣ ∣ = 𝑛𝑓𝑌 (𝑛𝑥) (0.14)
𝑑𝑋
For mgfs 𝑡
̄
𝑀𝑋̄ (𝑡) = 𝐸𝑒𝑡𝑋 = 𝐸𝑒𝑡(𝑋1 +⋯+𝑋𝑛 )/𝑛 = 𝐸𝑒(𝑡/𝑛) 𝑌 = [𝑀𝑋 ( )]𝑛
𝑛
(0.15)
where the last step uses the i.i.d. property of random samples.

Convolution Formula is useful in finding pdf of 𝑋.̄ If X and


Y are independent random variables with pdfs 𝑓𝑋 (𝑥) and 𝑓𝑌 (𝑦),
then the pdf of 𝑍 = 𝑋 + 𝑌+∞is
𝑓𝑍 (𝑧) = ∫ 𝑓𝑋 (𝜔)𝑓𝑌 (𝑧 − 𝜔)𝑑𝜔 (0.16)
−∞

For special distributions, the first and the most important one to
be considered is the multivariate normal distribution (MVN for
short).
Definition 0.5 (Multivariate Normal Distribution). Let 𝜇 ∈ ℝ𝑝
xvi Random Samples, Special Distribution (Lecture on 01/07/2020)

and Σ𝑝×𝑝 positive definite. A random vector 𝑋 ∈ R𝑝 has a p-


variate normal distribution with mean 𝜇 and covariance matrix Σ
if it has pdf
1 1
𝑓(x) = |2𝜋Σ|− 2 𝑒𝑥𝑝[− (x − )𝑇 Σ−1 (x − )] (0.17)
2
for 𝑋 ∈ ℝ𝑝 , and will be denoted as 𝑋 ∼ 𝑁𝑝 (, Σ).

Recall the definition of moment generating function and charac-


teristic function of a random variable X is defined as (0.18) and
(0.19), respectively. 𝑇
𝑀𝑋 (t) = 𝐸(𝑒t 𝑋 ) (0.18)
Φ𝑋 (t) = 𝐸(𝑒𝑖t𝑋 ) (0.19)
If X and Y are independent, then we have the following property
for mgf and characteristic function
𝑀𝑋+𝑌 (t) = 𝑀𝑋 (t)𝑀̇ 𝑌 (t) (0.20)
Φ𝑋+𝑌 (t) = Φ𝑋 (t)Φ̇ 𝑌 (t) (0.21)

Finally, the mgf and characteristic function of multivariate nor-


mally distributed r.v. 𝑋 is given by
1
𝑀𝑋 (t) = 𝑒𝑥𝑝(t𝑇 + t𝑇 Σt) (0.22)
2
1
Φ𝑋 (t) = 𝑒𝑥𝑝(𝑖t𝑇 − t𝑇 Σt) (0.23)
2
Theorem 0.3. Suppose 𝑋 ∼ 𝑁𝑝 (, Σ), then for any matrix 𝐵 ∈
ℝ𝑘×𝑝 with rank 𝑘 ≤ 𝑝, 𝑌 = 𝐵𝑋, 𝑌 ∼ 𝑁 (𝐵, 𝐵Σ𝐵𝑇 ).
(This theroem is Theorem 4.4a on Rencher and Schaalje (2007))

Proof. The mgf of 𝑌 is by definition


𝑇 𝑇
𝑌 𝐵𝑋
𝑀𝑌 (𝑡) = 𝐸(𝑒t ) = 𝐸(𝑒t ) = 𝑀𝑋 (𝐵𝑇 t) (0.24)
From (0.22) we have the form of 𝑀𝑋 (𝑡), therefore
1
𝑀𝑌 (𝑡) = 𝑒𝑥𝑝(t𝑇 𝐴 + t𝑇 𝐴Σ𝐴𝑇 t) (0.25)
2
Thus, the theorem is proved.
Bibliography

Casella, G. and Berger, R. (2002). Statistical Inference. Duxbury


Resource Center, Belmont, CA, 2nd edition. ISBN 978-
0534243128.
Rencher, A. and Schaalje, B. (2007). Linear Models in Statistics.
John Wiley and Sons, Ltd, 2nd edition. ISBN 978-0470192610.

xvii

You might also like