0% found this document useful (0 votes)
15 views25 pages

Chap 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

Chap 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

3.

The Statistical Setup

Ismaı̈la Ba

[email protected]
STAT 3100 - Winter 2024

1 / 25
Course Outline

1 Introduction

2 Some Additional Results

2 / 25
Introduction

Contents

1 Introduction

2 Some Additional Results

3 / 25
Introduction

θ : parameter (in general unknown).


x = (x1 , . . . , xn ) are realizations of random variables X = (X1 , . . . , Xn )
from a population of interest.
We make assumptions about the distributions of X1 , . . . , Xn (for
instance fX (.; θ)) in order to make inferences about the population
characteristics (mean, standard deviation, etc.)
iid
For example, Xi ∼ N(µ, σ2 ) random variables, where θ = (µ, σ2 ) is
unknown and need to be estimated.
Sometimes, we observe xi , . . . , xn , where xi = (xi1 , . . . , xip )′ . We
assume that these xi are realizations of multivariate random variables
Xi = (Xi1 , . . . , Xip )′ with some multivariate distribution, such as
multivariate normal, characterized withn a mean ovector µ and
covariance matrix Σp×p . That is, Θ = µ, Σp×p .

4 / 25
Introduction

Statistic

Definition 1
A statistic, denoted T = T (X1 , . . . , Xn ), is a function of observable
random variables X1 , . . . , Xn that does not depend on unknown parameters
and that can be calculated once the random variables are observed. Note
that T is a random variable with realization T = T (x1 , . . . , xn ).

Example 1
Let θ (unknown) be a population parameter.
We conduct an experiment, observations are x1 , x2 , . . . , xn .
Based on the experiment results x1 , x2 , . . . , xn , define θ̂ = f (x1 , x2 , . . . , xn ).
The statistic θ̂ = f (X1 , X2 , . . . , Xn ) is a random variable, varies from
sample to sample and informs the users about the distribution of θ̂.
θ̂ = X̄ , S, ...

5 / 25
Introduction

Statistic
Example 2

Let X1 , . . . , Xn be a random sample from a population with some


distribution function F . The sample mean is defined as
n
1X
X̄ = Xi .
n i=1

This is a random variable. Once X1 , . . . , Xn have been observed and we


have x1 , . . . , xn , the observed sample mean is x̄ = n1 ni=1 xi . If E(Xi ) = µ
P
and V(Xi ) = σ2 , then we have
 n  n
 1 X  1 X nµ
E(X̄ ) =E  Xi  = E(Xi ) = = µ,
n i=1 n i=1 n
 n  n
 1 X  ind 1 X nσ2 σ2
V(X̄ ) =V  
 Xi  = 2

 V(Xi ) = 2 = .
n i=1 n i=1 n n 6 / 25
Introduction

Statistic
Remark : If T = (X1 + . . . + Xn ) × σ2 , where σ2 is the variance of the Xi ,
then T is not observable (because σ2 is unknown), even though the Xi are.
Example 3
The sample variance is defined as
n
 n 
1 X 1 X 2 
2
S = 2
(Xi − X̄ ) =  Xi − nX̄  ,
2
n > 1.
n − 1 i=1 n − 1 i=1

Since E(Xi2 ) = V(Xi ) + E2 (Xi ) = σ2 + µ2 and E(X̄ 2 ) = µ2 + σ2 /n, we have


 n 
1 X  1
E(S 2 ) = E  Xi2 − nX̄ 2  = (n(σ2 + µ2 ) − n(µ2 + σ2 /n)) = σ2 .
n − 1 i=1 n−1
1 n − 3 4
V(S 2 ) = µ4 − σ , n > 1.( )
n n−1
7 / 25
Introduction

Sampling distribution

Recall that in the previous slide, µ4 = E[(Xi − µ)4 ] is the fourth central
moment of X .

Definition 2
Suppose that we draw all possible samples of size n from a given
population.
Suppose further that we compute a statistic (e.g., a mean, standard
deviation) for each sample.
A sampling distribution is the probability distribution of this statistic.

8 / 25
Introduction

Estimator

Definition 3
An estimator of a population parameter θ is a statistic that is
thought to produce values close to θ in some sense and is denoted by
θ̂(X). That is, a statistic that is used to estimate θ.
The estimated value of θ̂(X) is also called estimate and denoted by
θ̂(x).

Remark : The estimator is a random variable and the estimate is a


constant !

9 / 25
Introduction

Estimator

Example 4
1 Let X1 , . . . , Xn be n random variables with mean θ = µ, an estimator
of µ is for example given by
n
1X
µ̂(X) = Xi
n i=1

2 If for example, we observe x = (2, 1.4, 4.2, 5.6) then

1
µ̂(x) = (2 + 1.4 + 4.2 + 5.6) = 3.3
4
is the estimate of µ based on x.

10 / 25
Introduction

Properties of an estimator

Definition 4
Let X1 , . . . , Xn be a random sample from some population with a
parameter θ and suppose that θ̂ = θ̂(X1 , . . . , Xn ) is an estimator of θ. The
biais of θ̂ is defined to be

Biais(θ̂) = B(θ̂) = E(θ̂) − θ.

If B(θ̂) = 0, we say that θ̂ is unbiased for θ. The mean squared error of


θ̂ is defined to be
h i
MSE(θ̂) = E (θ̂ − θ)2 = V(θ̂) + B2 (θ̂).

Remark : It is clear that, if B(θ̂) = 0 then MSE(θ̂) = V(θ̂).


11 / 25
Introduction

Properties of an estimator

We assume that we have a sequence of estimators, say θ̂1 , θ̂2 , . . ., which


usually represent a sequence of estimators of θ with increasing sample size
so that θ̂n is based on a sample of size n.

We say that the sequence {θ̂n } is asymptotically unbiased for θ if


limn→∞ E(θ̂n ) = θ.

We say that the sequence {θ̂n } is consistent (or weakly consistent) for θ
if, for every ϵ > 0, limn→∞ P(|θ̂n − θ| > ϵ) = 0.
In a probability class, you would say that θ̂n converges to θ in probability.
In statistics classes, we say that θ̂n is consistent for θ. This says that, for
large n, the distribution of θ̂n is concentrated around θ.

12 / 25
Introduction

Markov inequality

If X is a random variable and u(x) is a non-negative real-valued function


then, for any real c > 0,

E[u(X )]
P(u(X ) ≥ c) ≤ .
c
In particular,
E[|X |]
P(|X | ≥ c) ≤ ,
c
which is the Markov inequality.

13 / 25
Introduction

Chebyshev’s inequality

If we let µ = E(X ), σ2 = V(X ) < ∞ and take u(x) = |x − µ|, we obtain

E[|X − µ|2 ] σ2
P(|X − µ| ≥ c) = P(|X − µ|2 ≥ c 2 ) ≤ = .
c2 c2
Taking c = kσ yields Chebyshev’s inequality,

E[|X − µ|2 ] 1
P(|X − µ| ≥ kσ) = P(|X − µ|2 ≥ k 2 σ2 ) ≤ = 2.
k 2 σ2 k

For example, the probability that X is more than k = 2 standard deviations


from µ is bounded above by 212 = 14 .

14 / 25
Introduction

More on inequalities

Let {θ̂n } be a sequence of estimators of θ. We have


h i
E |θ̂n − θ|2 MSE(θ̂n )
P(|θ̂n − θ| ≥ c) ≤ = .
c2 c2

Thus, if MSE(θ̂n ) → 0 as n → ∞, then θ̂n is consistent for θ. This is


commonly referred to as mean squared error consistency.
If a sequence of estimators is MSE consistent, then it is consistent.
It is not true that all consistent estimators are MSE consistent.

15 / 25
Introduction

Example

Example 5
Going back to Example 2, if V(Xi ) < σ2 , we have that

E[X̄n ] = E[(X1 + . . . + Xn )/n] = µ and V(X̄n ) = σ2 /n.


σ2
Thus, X̄n is unbiased and MSE(X̄n ) = V(X̄n ) = n . Hence,

σ2
P(|X̄n − µ| ≥ ϵ) ≤ → 0 as n → ∞.
nϵ 2
Since this is true for all ϵ > 0, we have that X̄n is consistent for µ.

16 / 25
Introduction

Exercises

Exercise 1 ( )
For a random sample X1 , . . . , Xn such that E(Xi4 ) < ∞, show that S 2 is
consistent for σ2 .

Exercise 2 ( )
Suppose that X1 , X2 , . . . are iid Exp(β) random variables and define
n n
1X 1 X
X̄n = Xi and Sn2 = (Xi − X̄n )2 .
2 i=1 n − 1 i=1

1 Show that X̄n is unbiased and consistent for β and that Sn2 is unbiased
and consistent for β2 .
2 Since X̄n is unbiased for β, we could consider using X̄n2 as an estimator
of β2 instead of Sn2 . Show that X̄n2 is asymptotically unbiased for β2 .
What is the bias ? 17 / 25
Some Additional Results

Contents

1 Introduction

2 Some Additional Results

18 / 25
Some Additional Results

Convergence in Probability to a Constant

Let {Yn }∞
n=1 be a sequence of random variables, then we say that Yn
converges to a constant c in probability if

lim P(|Yn − c| > ϵ) = 0, for all ϵ > 0.


n→∞

Remark
P
To say that Yn converges to c in probability, we write Yn → c.
If Yn ’s are all estimators of c, we will say that {Yn } is consistent for c.

Example 6
P
Let Yn ∼ Exp(1/n), show that Yn → 0. We have

lim P(|Yn − 0| > ϵ) = lim P(Yn > ϵ) = lim e −nϵ = 0, for all ϵ > 0.
n→∞ n→∞ n→∞
19 / 25
Some Additional Results

Convergence in Probability to a Constant

Proposition
Let a, b, c, d be real constants. Let {Xn } and {Yn } be sequences of random
P P
variables such that Xn → c and Yn → d. Then,
P
1 aXn + bYn → ac + bd.
P
2 Xn Yn → cd.
P
3 If g (·) is continuous at c, then g (Xn ) → g (c).

Example 7
P
Let Xn → c ∈ R and let g (·) be a real valued function, continuous at c.
P
1 Provided c , 0, 1/Xn → 1/c (g (x) = 1/x).
√ P √ √
2 Provided c > 0, Xn → c (g (x) = x).
20 / 25
Some Additional Results

Law of Large Numbers (LLL)


LLL (Khinchin’s version)
If X1 , X2 , . . . are iid random variables such that µ = E(Xi ) < ∞, then

lim P(|X̄n − µ| > ϵ) = 0, for all ϵ > 0.


n→∞

P
That is, X̄n is consistent for µ or X̄n → µ.

Let Yi = Xik for some k ∈ N and all i ∈ N, and define


Sn = Y1 + Y2 + . . . + Yn such that E(Yi ) < ∞. Then
n
1X k
X is consistent for E(Xik ) = µ′k .
n i=1 i

Remark : For a random variable X , if E(X k ) < ∞ then E(X j ) < ∞ (and so
E[(X − µ)j ] < ∞) for j = 1, . . . , k.
21 / 25
Some Additional Results

We define the sample moments for a random sample {Xi }ni=1 as


follows
n
1X k
Mk = X k = 1, 2, . . .
n i=1 i

Since n and the Xi are finite (observable), Mk will always exist.

When µ′k exists, Mk are consistent for µ′k .

Note that we have lost the sample size in this notation.

22 / 25
Some Additional Results

Convergence in Distribution

A sequence of random variables X1 , X2 , . . . with respective CDF’s FXi


d
converges to X (with CDF FX ) in distribution, written Xn → X , if

lim FXn (t) = FX (t)


n→∞

at all t for which F is continuous.

Remark
Convergence in distribution is weaker than convergence in
probability.

23 / 25
Some Additional Results

Slutsky’s Theorem

Proposition
Let {Xn }∞ ∞
n=1 and {Yn }n=1 be sequences of random variables such that
d P
Xn → X and Yn → c, where X is a random variable and c is a constant.
Then
d
1 Xn + Yn → X + c.
d
2 Xn Yn → cX .
d
3 Provided c , 0, Xn /Yn → X /c.

24 / 25
Some Additional Results

Univariate Delta Method

Proposition
Let {Xn }∞n=1 be a sequence of random variables. Let θ, σ ∈ R be constants
2

with σ2 > 0, and let g (·) be a real valued function such that g ′ (θ) , 0
exists. If
√ d
n(Xn − θ) → N(0, σ2 ),
then
√ d
n(g (Xn ) − g (θ)) → N(0, σ2 [g ′ (θ)]2 ).

25 / 25

You might also like