0% found this document useful (0 votes)

46 views74 pages

Probability 3.2 EdX

Uploaded by

Archana Keshavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views74 pages

Probability 3.2 EdX

Uploaded by

Archana Keshavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

2.

Getting Started with Statistics

Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology

3/2/20

ISYE 6739
Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data
3 Candidate Distributions
4 Introduction to Estimation
5 Unbiased Estimation
6 Mean Squared Error
7 Maximum Likelihood Estimation
8 Trickier MLE Examples
9 Invariance Property of MLEs
10 Method of Moments Estimation
11 Sampling Distributions

ISYE 6739
Introduction to Descriptive Statistics

Lesson 2.1 — Introduction to Descriptive Statistics

What’s Coming Up:

Three high-level lessons on what Statistics is (not involving much math).
Several lessons on estimating parameters of probability distributions.
One lesson on certain distributions that will come up in subsequent
Statistics modules — normal, time for t, χ2 , and F .

Statistics forms a rational basis for decision-making using observed or

experimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).
The comparison of many populations.

ISYE 6739
Introduction to Descriptive Statistics

Examples:
Election polling.
Coke vs. Pepsi.
The effect of cigarette smoking on the probability of getting cancer.
The effect of a new drug on the probability of contracting hepatitis.
What’s the most popular TV show during a certain time period?
The effect of various heat-treating methods on steel tensile strength.
Which fertilizers improve crop yield?
King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, we
take a sample of data from the population of voters, and try to make a
reasonable conclusion based on that sample.

ISYE 6739
Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations to
take, how to take them, etc.), and then how to draw conclusions from the
sampled data.

Types of Data
Continuous variables: Can take on any real value in a certain
interval. For example, the lifetime of a lightbulb or the weight of a
newborn child.
Discrete variables: Can only take on specific values. E.g., the number
of accidents this week at a factory or the possible rolls of a pair of dice.
Categorical variables: These data are not typically numerical.
What’s your favorite TV show during a certain time slot?

ISYE 6739
Introduction to Descriptive Statistics

Plotting Data

A picture is worth 1000 words. Always plot data before doing anything else,
if only to identify any obvious issues such as nonstandard distributions,
missing data points, outliers, etc.

Histograms provide a quick, succinct look at what you are dealing with. If
you take enough observations, the histogram will eventually converge to the
true distribution. But sometimes choosing the optimal number of cells is a
little tricky — like Goldilocks!

ISYE 6739
Summarizing Data

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Summarizing Data

Lesson 2.2 — Summarizing Data

In addition to plotting data, how do we summarize data?

It’s nice to have lots of data. But sometimes it’s too much of a good thing!
Need to summarize.

Example: Grades on a test (i.e., raw data):

23 62 91 83 82 64 73 94 94 52
67 11 87 99 37 62 40 33 80 83
99 90 18 73 68 75 75 90 36 55

ISYE 6739
Summarizing Data

Stem-and-Leaf Diagram of grades. Easy way to write down all of the data.
Saves some space, and looks like a sideways histogram.

9 9944100
8 73320
7 5533
6 87422
5 52
4 0
3 763
2 3
1 81

ISYE 6739
Summarizing Data

Grouped Data

Cumul. Proportion of
Range Freq. Freq. observations so far
0–20 2 2 2/30
21–40 5 7 7/30
41–60 2 9 9/30
61–80 10 19 19/30
81–100 11 30 1

ISYE 6739
Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

n
X
X̄ ≡ Xi /n = 66.5.
i=1

The sample variance is

n
1 X
S2 ≡ (Xi − X̄)2 = 630.6.
n−1
i=1

Remark: Before you take any observations, X̄ and S 2 must be regarded as

random variables.

ISYE 6739
Summarizing Data

In general, suppose that we sample iid data X1 , . . . , Xn from the population

of interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of the

underlying distribution of the data.

Measures of Central Tendency:

Pn
Sample Mean: X̄ = i=1 Xi /n.

Sample Median: The “middle” observation when the Xi ’s are arranged

numerically.

ISYE 6739
Summarizing Data

Example: 16, 7, 83 gives a median of 16.

16+20
Example: 16, 7, 83, 20 gives a “reasonable” median of 2 = 18.

Remark: The sample median is less susceptible to “outlier” data than the
sample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a sample

median of 7.

Sample Mode: “Most common” value. Not the most useful measure
sometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739
Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:
n n
X
2 1 X 2 1 2 2
S ≡ (Xi − X̄) = Xi − nX̄ ,
n−1 n−1
i=1 i=1

the latter expression being easier to compute.

√
Sample Standard Deviation: S = + S 2 .

Sample Range: maxi Xi − mini Xi .

ISYE 6739
Summarizing Data

Remark: Suppose the data takes p different values X1 , . . . , Xp , with

frequencies f1 , . . . , fp , respectively.

How to calculate X̄ and S 2 quickly?

p Pp 2
X
2 j=1 fj Xj − nX̄ 2
X̄ = fj Xj /n and S = .
n−1
j=1

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6
fj 2 1 1 3 0 3

Then X̄ = (2 · 1 + 1 · 2 + · · · + 3 · 6)/10 = 3.7, and S 2 = 3.789. 2

ISYE 6739
Summarizing Data

Remark: If the individual observations can’t be determined in frequency

distributions, you might just break the observations up into c intervals.

Example: Suppose c = 3, where we denote the midpoint of the jth interval

by mj , j = 1, . . . , c, and the total sample size n = cj=1 fj = 30.
P

Xj interval mj fj
100–150 125 10
150–200 175 15
200–300 250 5
Pc
j=1 fj mj
X̄ ≈ = 170.833 and
n
Pc 2
j=1 fj mj − nX̄ 2
S2 ≈ = 1814. 2
n−1

ISYE 6739
Candidate Distributions

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Candidate Distributions

Lesson 2.3 — Candidate Distributions

Time to make an informed guess about the type of probability distribution

we’re dealing with. We’ll look at more-formal methodology for fitting
distributions later in the course when we do goodness-of-fit tests. But for now,
some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?
How much data is available?
Are experts around to ask about nature of the data?
What if we do not have much/any data — can we at least guess at a good
distribution?

ISYE 6739
Candidate Distributions

If the distribution is a discrete random variable, then we have a number of

familiar choices to select from.
Bernoulli(p) (success with probability p)
Binomial(n, p) (number of successes in n Bern(p) trials)
Geometric(p) (number of Bern(p) trials until first success)
Negative Binomial (number of Bern(p) trials until multiple successes)
Poisson(λ) (counts the number of arrivals over time)
Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739
Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimum
and maximum possible values)
Triangular (at least we have an idea regarding the minimum, maximum,
and “most likely” values)
Exponential(λ) (e.g., interarrival times from a Poisson process)
Normal (a good model for heights, weights, IQs, sample means, etc.)
Beta (good for specifying bounded data)
Gamma, Weibull, Gumbel, lognormal (reliability data)
Empirical (our all-purpose friend)

ISYE 6739
Introduction to Estimation

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Introduction to Estimation

Lesson 2.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1 , . . . , Xn , and

not explicitly dependent on any unknown parameters.

Examples of statistics: X̄ and S 2 , but not (X̄ − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expect
to get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from the

underlying probability distribution of the Xi ’s.

Examples of parameters: µ, σ 2 .

ISYE 6739
Introduction to Estimation

Let X1 , . . . , Xn be iid RV’s and let T (X) ≡ T (X1 , . . . , Xn ) be a statistic

based on the Xi ’s. Suppose we use T (X) to estimate some unknown
parameter θ. Then T (X) is called a point estimator for θ.

Examples: X̄ is usually a point estimator for the mean µ = E[Xi ], and S 2 is

often a point estimator for the variance σ 2 = Var(Xi ).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739
Unbiased Estimation

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Unbiased Estimation

Lesson 2.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1 , . . . , Xn are iid anything with mean µ.

Then X̄ is always unbiased for µ.
Xn
1
E[X̄] = E Xi = E[Xi ] = µ.
n
i=1

That’s why X̄ is called the sample mean. 2

Baby Example: In particular, suppose X1 , . . . , Xn are iid Exp(λ). Then X̄

is unbiased for µ = E[Xi ] = 1/λ.

But be careful. . . . 1/X̄ is biased for λ in this exponential case, i.e.,

E[1/X̄] 6= 1/E[X̄] = λ. 2

ISYE 6739
Unbiased Estimation

Example/Theorem: Suppose X1 , . . . , Xn are iid anything with mean µ and

variance σ 2 . Then S 2 is always unbiased for σ 2 .
n
2 1 X
E[S ] = E (Xi − X̄)2 = Var(Xi ) = σ 2 .
n−1
i=1

This is why S 2 is called the sample variance. 2

Baby Example: Suppose X1 , . . . , Xn are iid Exp(λ). Then S 2 is unbiased

for Var(Xi ) = 1/λ2 . 2

ISYE 6739
Unbiased Estimation

Proof (of general result): First, some algebra gives

n
X n
X
(Xi − X̄)2 = (Xi2 − 2X̄Xi + X̄ 2 )
i=1 i=1
n
X n
X
= Xi2 − 2X̄ Xi + nX̄ 2
i=1 i=1
n
X
= Xi2 − 2nX̄ 2 + nX̄ 2
i=1
n
X
= Xi2 − nX̄ 2 .
i=1

So. . .

ISYE 6739
Unbiased Estimation

n n
1 hX i 1 hX i
E[S 2 ] = E (Xi − X̄)2 = E Xi2 − nX̄ 2
n−1 n−1
i=1 i=1
n
1 X
= E[Xi2 ] − nE[X̄ 2 ]
n−1
i=1
n
= E[X12 ] − E[X̄ 2 ] (since the Xi ’s are iid)
n − 1
n
= Var(X1 ) + (E[X1 ])2 − Var(X̄) − (E[X̄])2
n−1
n
= (σ 2 − σ 2 /n) (since E[X1 ] = E[X̄] and Var(X̄) = σ 2 /n)
n−1
= σ 2 . Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739
Unbiased Estimation

iid
Big Example: Suppose that X1 , . . . , Xn ∼ Unif(0, θ), i.e., the pdf is
f (x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch of
random numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X̄.
n+1
Y2 = n max1≤i≤n Xi .
(
12X̄ w.p. 1/2
Y3 =
−8X̄ w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739
Unbiased Estimation

“Good” Estimator: Y1 = 2X̄.

Proof (that it’s unbiased): E[Y1 ] = 2E[X̄] = 2E[Xi ] = θ. 2

n+1
“Better” Estimator: Y2 = n max1≤i≤n Xi .

Why might this estimator for θ make sense? (We’ll say why it’s “better” in a
little while.)
n+1
Proof (that it’s unbiased): E[Y2 ] = n E[maxi Xi ] = θ iff

nθ
E[max Xi ] = (which is what we’ll show below).
n+1

ISYE 6739
Unbiased Estimation

First, let’s get the cdf of M ≡ maxi Xi :

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · · P (Xn ≤ y) (Xi ’s indep)

= [P (X1 ≤ y)]n (Xi ’s indentically distributed)

Z y n
= fX1 (x) dx
0
Z y n
= (1/θ) dx
0

= (y/θ)n .

ISYE 6739
Unbiased Estimation

This implies that the pdf of M is

d ny n−1
fM (y) ≡ (y/θ)n = , 0 < y < θ,
dy θn
and this implies that
θ θ
ny n
Z Z
nθ
E[M ] = yfM (y) dy = dy = .
0 0 θn n+1

n+1
Whew! This finally shows that Y2 = n max1≤i≤n Xi is an unbiased
estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739
Unbiased Estimation

“Ugly” Estimator:
(
12X̄ w.p. 1/2
Y3 =
−8X̄ w.p. 1/2.
Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

1 1
E[Y3 ] = 12E[X̄] · − 8E[X̄] · = 2E[X̄] = θ. 2
2 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3
shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739
Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

iid
X1 , . . . , Xn ∼ Unif(0, θ).
n+1
Recall that both Y1 = 2X̄ and Y2 = n M are unbiased for θ.

Let’s find Var(Y1 ) and Var(Y2 ). First,

4 4 θ2 θ2
Var(Y1 ) = 4Var(X̄) = · Var(Xi ) = · = .
n n 12 3n

ISYE 6739
Unbiased Estimation

Meanwhile,
2
n+1
Var(Y2 ) = Var(M )
n
2 2
n+1 n+1
= E[M 2 ] − · E[M ]
n n
2 Z θ
ny n+1

n+1
= dy − θ2
n 0 θn

(n + 1)2 θ2 θ2
= θ2 · − θ2 = < .
n(n + 2) n(n + 2) 3n

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1 .
We can break the “unbiasedness tie” by choosing Y2 . 2

ISYE 6739
Mean Squared Error

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Mean Squared Error

Lesson 2.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combines

information about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X) − θ)2 ].

Before giving an easier interpretation of MSE, define the bias of an estimator

for the parameter θ,

Bias(T (X)) ≡ E[T (X)] − θ.

ISYE 6739
Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X) − θ)2 ]

= E[T 2 ] − 2θE[T ] + θ2

= E[T 2 ] − (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ] − θ)2 .

| {z }
Bias

So MSE = Bias2 + Var, and thus combines the bias and variance of an
estimator. 2

ISYE 6739
Mean Squared Error

The lower the MSE the better. If T1 (X) and T2 (X) are two estimators of θ,
we’d usually prefer the one with the lower MSE — even if it happens to have
higher bias.

Definition: The relative efficiency of T2 (X) to T1 (X) is

MSE(T1 (X))/MSE(T2 (X)). If this quantity is < 1, then we’d want T1 (X).

Example: Suppose that estimator A has bias = 3 and variance = 10, while
estimator B has bias = −2 and variance = 14. Which estimator (A or B) has
the lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739
Mean Squared Error

iid
Example: X1 , . . . , Xn ∼ Unif(0, θ).
n+1
Two estimators: Y1 = 2X̄, and Y2 = n maxi Xi .

Showed before E[Y1 ] = E[Y2 ] = θ (so both estimators are unbiased).

θ2 θ2
Also, Var(Y1 ) = 3n , and Var(Y2 ) = n(n+2) .

Thus,
θ2 θ2
MSE(Y1 ) = and MSE(Y2 ) = ,
3n n(n + 2)
so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739
Maximum Likelihood Estimation

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Maximum Likelihood Estimation

Lesson 2.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1 , . . . , Xn , where each Xi has

pmf/pdf f (x). Further, suppose that θ is some unknown parameter from Xi .
Qn
The likelihood function is L(θ) ≡ i=1 f (xi ).

The maximum likelihood estimator (MLE) of θ is the value of θ that

maximizes L(θ). The MLE is a function of the Xi ’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”
estimate of θ.

ISYE 6739
Maximum Likelihood Estimation

iid
Example: Suppose X1 , . . . , Xn ∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

n
Y n
Y n
X
L(λ) = f (xi ) = λe−λxi = λn exp − λ xi .
i=1 i=1 i=1

Now maximize L(λ) with respect to λ. Could take the derivative and plow
through all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to see
that the λ that maximizes L(λ) also maximizes `n(L(λ))!
n
X n
X
n
`n(L(λ)) = `n λ exp − λ xi = n`n(λ) − λ xi .
i=1 i=1

ISYE 6739
Maximum Likelihood Estimation

The trick makes our job less horrible.

n n
d d X n X
`n(L(λ)) = n`n(λ) − λ xi = − xi ≡ 0.
dλ dλ λ
i=1 i=1

This implies that the MLE is λ̂ = 1/X̄. 2

Remarks:
λ̂ = 1/X̄ makes sense, since E[X] = 1/λ.
hat over λ to indicate that this is the MLE. It’s
At the end, we put a little d
like a party hat!
At the end, we make all of the little xi ’s into big Xi ’s to indicate that this
is a random variable.
Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739
Maximum Likelihood Estimation

iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p). Find the MLE for p.
Useful trick for this problem: Since
(
1 w.p. p
Xi =
0 w.p. 1 − p,

we can write the pmf as

f (x) = px (1 − p)1−x , x = 0, 1.

Thus, the likelihood function is

n
Y n
Y Pn Pn
L(p) = f (xi ) = pxi (1 − p)1−xi = p i=1 xi
(1 − p)n− i=1 xi
.
i=1 i=1

ISYE 6739
Maximum Likelihood Estimation

This implies that

n
X n
X
`n(L(p)) = xi `n(p) + n − xi `n(1 − p)
i=1 i=1

⇒ P P
d i xi n − i xi
`n(L(p)) = − ≡ 0
dp p 1−p
⇒
n
X n
X
(1 − p) xi = p n− xi
i=1 i=1
⇒
p̂ = X̄.
This makes sense since E[X] = p. 2

ISYE 6739
Trickier MLE Examples

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Trickier MLE Examples

Lesson 2.8 — Trickier MLE Examples

iid
Example: X1 , . . . , Xn ∼ Nor(µ, σ 2 ). Get simultaneous MLEs for µ and σ 2 .
n n
Y Y 1 n 1 (xi − µ)2 o
L(µ, σ 2 ) = f (xi ) = √ exp −
2πσ 2 2 σ2
i=1 i=1
n 1 X (x − µ)2 on
1 i
= exp − .
(2πσ 2 )n/2 2 σ2
i=1

n
n n 1 X
⇒ `n(L(µ, σ 2 )) = − `n(2π) − `n(σ 2 ) − 2 (xi − µ)2
2 2 2σ
i=1
n
∂ 1 X
⇒ `n(L(µ, σ 2 )) = 2 (xi − µ) ≡ 0,
∂µ σ
i=1

and so µ̂ = X̄.

ISYE 6739
Trickier MLE Examples

Similarly, take the partial with respect to σ 2 (not σ),

n
∂ 2 n 1 X
`n(L(µ, σ )) = − + (xi − µ̂)2 ≡ 0,
∂σ 2 2σ 2 2σ 4
i=1

and eventually get

n
c2 = 1
X
σ (Xi − X̄)2 . 2
n
i=1

c2 is to the (unbiased) sample variance,

Remark: Notice how close σ
n
1 X n c2
S2 = (Xi − X̄)2 = σ .
n−1 n−1
i=1

c2 is a little bit biased, but it has slightly less variance than S 2 . Anyway, as n
σ
gets big, S 2 and σ c2 become the same.

ISYE 6739
Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

λr r−1 −λx
f (x) = x e , x > 0.
Γ(r)
iid
Suppose X1 , . . . , Xn ∼ Gam(r, λ). Find the MLEs for r and λ.
n n
Y λnr Y r−1 −λ Pi xi
L(r, λ) = f (xi ) = xi e
[Γ(r)]n
i=1 i=1

n
Y n
X
⇒ `n(L) = rn `n(λ) − n `n(Γ(r)) + (r − 1)`n xi − λ xi
i=1 i=1
n
∂ rn X
⇒ `n(L) = − xi ≡ 0,
∂λ λ
i=1

so that λ̂ = r̂/X̄.

ISYE 6739
Trickier MLE Examples

The Trouble in River City is, we need to find r̂. To do so, we have
n n
∂ ∂ h Y X i
`n(L) = rn `n(λ) − n `n(Γ(r)) + (r − 1)`n xi − λ xi
∂r ∂r
i=1 i=1
n
n d Y
= n `n(λ) − Γ(r) + `n xi
Γ(r) dr
i=1
n
Y
= n `n(λ) − nΨ(r) + `n xi ≡ 0,
i=1

where Ψ(r) ≡ Γ0 (r)/Γ(r) is the digamma function.

ISYE 6739
Trickier MLE Examples

At this point, substitute in λ̂ = r̂/X̄, and use a computer method (bisection,

Newton’s method, etc.) to search for the value of r that solves
n
Y
n `n(r/X̄) − nΨ(r) + `n xi ≡ 0.
i=1

The gamma function is readily available in any reasonable software package;

but if the digamma function happens to be unavailable in your town, you can
take advantage of the approximation

. Γ(r + h) − Γ(r)
Γ0 (r) = (for any small h of your choosing). 2
h

ISYE 6739
Trickier MLE Examples

iid
Example: Suppose X1 , . . . , Xn ∼ Unif(0, θ). Find the MLE for θ.

The pdf is f (x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then
n
Y
L(θ) = f (xi ) = 1/θn if 0 ≤ xi ≤ θ, ∀i
i=1

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, we

must have θ ≥ maxi xi .

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possible

θ value, namely, θ̂ = maxi Xi .

This makes sense in light of the similar (unbiased) estimator,

Y2 = n+1
n maxi Xi , from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739
Invariance Property of MLEs

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Invariance Property of MLEs

Lesson 2.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ̂ is the MLE of some parameter θ and

h(·) is any reasonable function, then h(θ̂) is the MLE of h(θ).

Remark: We noted before that such a property does not hold for
unbiasedness.
√ For instance, although E[S 2 ] = σ 2 , it is usually the case that
2
E[ S ] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is a

one-to-one function. It’s not so easy — but still generally true — when h(·) is
nastier.

ISYE 6739
Invariance Property of MLEs

iid
Example: Suppose X1 , . . . , Xn ∼ Nor(µ, σ 2 ).

1 Pn
We saw that the MLE for σ 2 is σ
c2 =
n i=1 (Xi − X̄)2 .
√
If we consider the function h(y) = + y, then the Invariance Property says
that the MLE of σ is
v
q u n
u1 X
σ̂ = 2
σ =t
c (Xi − X̄)2 . 2
n
i=1

iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p).

We saw that the MLE for p is p̂ = X̄. Then Invariance says that the MLE for
Var(Xi ) = p(1 − p) is p̂(1 − p̂) = X̄(1 − X̄). 2

ISYE 6739
Invariance Property of MLEs

iid
Example: Suppose X1 , . . . , Xn ∼ Exp(λ).

We define the survival function as

F̄ (x) = P (X > x) = 1 − F (x) = e−λx .

In addition, we saw that the MLE for λ is λ̂ = 1/X̄.

Then Invariance says that the MLE of F̄ (x) is

[
F̄ (x) = e−λ̂x = e−x/X̄ .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739
Method of Moments Estimation

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Method of Moments Estimation

Lesson 2.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

( P
xk f (x) if X is discrete
µk ≡ E[X ] =k
R xk
R x f (x) dx if X is continuous.

Definition: Suppose X1 , . . . , Xn are iid random variables. P

Then the
method of moments (MoM) estimator for µk is mk ≡ ni=1 Xik /n.

Remark:
Pn As n → ∞, the Law of Large Numbers implies that
k k
i=1 Xi /n → E[X ], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739
Method of Moments Estimation

Examples:

The MoM estimator

Pn for the true mean µ1 = µ = E[Xi ] is the sample mean
m1 = X̄ = i=1 Xi /n.
Pn
The MoM estimator for µ2 = E[Xi2 ] is m2 = 2
i=1 Xi /n.

The MoM estimator for Var(Xi ) = E[Xi2 ] − (E[Xi ])2 = µ2 − µ21 is

n
1X 2 n−1 2
m2 − m21 = Xi − X̄ 2 = S .
n n
i=1

(For large n, it’s also OK to use S 2 .)

General Game Plan: Express the parameter of interest in terms of the true
moments µk = E[X k ]. Then substitute in the sample moments mk .

ISYE 6739
Method of Moments Estimation

iid
Example: Suppose X1 , . . . , Xn ∼ Pois(λ).

Since λ = E[Xi ], a MoM estimator for λ is X̄.

n−1 2
But also note that λ = Var(Xi ), so another MoM estimator for λ is n S
(or plain old S 2 ). 2

Usually use the easier-looking estimator if you have a choice.

iid
Example: Suppose X1 , . . . , Xn ∼ Nor(µ, σ 2 ).
n−1 2
MoM estimators for µ and σ 2 are X̄ and n S (or S 2 ), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739
Method of Moments Estimation

iid
Example: Suppose X1 , . . . , Xn ∼ Beta(a, b). The pdf is

Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)

It turns out (after lots of algebra) that

a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)

Let’s estimate a and b via MoM.

ISYE 6739
Method of Moments Estimation

We have
a b E[X] . bX̄
E[X] = ⇒ a = = , (1)
a+b 1 − E[X] 1 − X̄
so
ab E[X]b
Var(X) = = .
(a + b)2 (a + b + 1) (a + b)(a + b + 1)
bX̄
Plug into the above X̄ for E[X], S 2 for Var(X), and 1−X̄
for a. Then after
lots of algebra, we can solve for b:

. (1 − X̄)2 X̄
b = − 1 + X̄.
S2
To finish up, you can plug back into Equation (1) to get the MoM estimator
for a.

ISYE 6739
Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observations

that we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X̄ = 0.67, and S 2 = 0.04971. Then the MoM

estimators are
. (1 − X̄)2 X̄
b = − 1 + X̄ = 1.1377,
S2
and then
. bX̄
a = = 2.310. 2
1 − X̄

ISYE 6739
Sampling Distributions

Outline

1 Introduction to Descriptive Statistics

ISYE 6739
Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidence

intervals” (CIs) and “hypothesis tests”: Normal, χ2 , t, and F .

Definition: Recall that a statistic is just a function of the observations

X1 , . . . , Xn from a random sample. The function does not depend explicitly
on any unknown parameters.

Example: X̄ and S 2 are statistics, but (X̄ − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.
The distribution of a statistic is called a sampling distribution.

iid
Example: X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739
Sampling Distributions

χ2 Distribution
iid
Definition/Theorem: If Z1 , . . . , Zk ∼ Nor(0, 1), then Y ≡ ki=1 Zi2 has
P
the chi-squared distribution with k degrees of freedom (df), and we
write Y ∼ χ2 (k).

The term “df” informally corresponds to the number of “independent pieces

of information” you have. For example, if you have RV’s X1 , . . . , Xn such
that ni=1 Xi = c, a known constant, then you might have n − 1 df, since
P
knowledge of any n − 1 of the Xi ’s gives you the remaining Xi .

We also informally “lose” a degree of freedom every time we have to estimate

a parameter. For instance, if we have access to n observations, but have to
estimate two parameters µ and σ 2 , then we might only end up with n − 2 df.

In reality, df corresponds to the number of dimensions of a certain space (not

covered in this course)!

ISYE 6739
Sampling Distributions

The pdf of the chi-squared distribution is

1 k
fY (y) = y 2 −1 e−y/2 , y > 0.
2k/2 Γ k2

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.

In fact, χ2 (2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2 (k) pdf is skewed to the right. (You get an occasional
“large” observation.)

For large k, the χ2 (k) is approximately normal (by the CLT).

ISYE 6739
Sampling Distributions

Definition: The (1 − α) quantile of a RV X is that value xα such that

P (X > xα ) = 1 − F (xα ) = α. Note that xα = F −1 (1 − α), where F −1 (·)
is the inverse cdf of X.

Notation: If Y ∼ χ2 (k), then we denote the (1 − α) quantile with the

special symbol χ2α,k (instead of xα ). In other words, P (Y > χ2α,k ) = α.
You can look up χ2α,k , e.g., in a table at the back of the book or via the Excel
function CHISQ.INV(1 − α, k).

Example: If Y ∼ χ2 (10), then

P (Y > χ20.05,10 ) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739
Sampling Distributions

Theorem:Pχ2 ’s add up. IfPY1 , . . . , Yn are independent with Yi ∼ χ2 (di ), for

all i, then ni=1 Yi ∼ χ2 ( ni=1 di ).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ 2 .

iid
Example: If X1 , . . . , Xn ∼ Nor(µ, σ 2 ), then, as we’ll show in the next
module,
n
1 X σ 2 χ2 (n − 1)
S 2
= (Xi − X̄)2 ∼ . 2
n−1 n−1
i=1

ISYE 6739
Sampling Distributions

t Distribution
2
Definition/Theorem: Suppose thatp Z ∼ Nor(0, 1), Y ∼ χ (k), and Z and
Y are independent. Then T ≡ Z/ Y /k has the Student t distribution
with k degrees of freedom, and we write T ∼ t(k).

The pdf is

Γ k+1
− k+1
2 x2 2
fT (x) = √ k
+ 1 , x ∈ R.
πk Γ 2 k

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k) → Nor(0, 1).

k
Can show that E[T ] = 0 for k > 1, and Var(T ) = k−2 for k > 2.
ISYE 6739
Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1 − α) quantile by tα,k .

In other words, P (T > tα,k ) = α.

Example: If T ∼ t(10), then P (T > t0.05,10 ) = 0.05, where we find

t0.05,10 = 1.812 in the back of the book or via the Excel function
T.INV(1 − α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests for
the mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.
Gossett was a statistician at the Guinness Brewery.

ISYE 6739
Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2 (n), Y ∼ χ2 (m), and X and Y

are independent. Then F ≡ YX/n
/m = mX/(nY ) has the F distribution with
n and m df, denoted F ∼ F (n, m).

The pdf is
n n n
Γ n+m ) 2 x 2 −1

2 (m
fF (x) = n+m , x > 0.
Γ n2 Γ m

n
2 (m x + 1) 2

Fun Facts: The F (n, m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

m
Can show that E[F ] = m−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739
Sampling Distributions

Notation: If F ∼ F (n, m), then we denote the (1 − α) quantile by Fα,n,m .

That is, P (F > Fα,n,m ) = α.

Tables came be found in back of the book for various α, n, m or you can use
the Excel function F.INV(1 − α, n, m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10 ) = 0.05, where we find

F0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m . Use this fact if you
have to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests for
the ratio of variances from two different processes. Details later.

ISYE 6739

DSML
No ratings yet
DSML
510 pages
Programming Python Statistics
No ratings yet
Programming Python Statistics
7 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Statistics
No ratings yet
Statistics
99 pages
L2-More On Describing Data
No ratings yet
L2-More On Describing Data
154 pages
CH 3 - Luc
No ratings yet
CH 3 - Luc
76 pages
Bản Sao Của Chapter1 - Introduction - S
No ratings yet
Bản Sao Của Chapter1 - Introduction - S
92 pages
Class 1
No ratings yet
Class 1
52 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Combinepdf
No ratings yet
Combinepdf
137 pages
EE311 Lecture #2 Descriptive Statistics
No ratings yet
EE311 Lecture #2 Descriptive Statistics
47 pages
CH 3 Numerical Summaries Final PDF 2 23102024 104402pm
No ratings yet
CH 3 Numerical Summaries Final PDF 2 23102024 104402pm
46 pages
Topic1 3
No ratings yet
Topic1 3
41 pages
MATH2203 Statistics I - Week 1
No ratings yet
MATH2203 Statistics I - Week 1
27 pages
Permutation and Combinations
From Everand
Permutation and Combinations
Ramesh Chandra
4/5 (36)
Math Test Prep File
No ratings yet
Math Test Prep File
88 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Statistics
No ratings yet
Statistics
152 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Stat Merge 2
No ratings yet
Stat Merge 2
123 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Advance Statistics For Data Science and Data Analysis
No ratings yet
Advance Statistics For Data Science and Data Analysis
47 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
42 pages
DSOST2
No ratings yet
DSOST2
44 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
ch03 DescriptiveStatistics
No ratings yet
ch03 DescriptiveStatistics
18 pages
Chapter5 English Lectures
No ratings yet
Chapter5 English Lectures
21 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
DAVA Notes 1-1
No ratings yet
DAVA Notes 1-1
19 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
ML Unit-3
No ratings yet
ML Unit-3
18 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
15 pages
ElemStat - Module 3 - Introduction To Statistics - W3 Portrait
No ratings yet
ElemStat - Module 3 - Introduction To Statistics - W3 Portrait
21 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Lecture Notes
No ratings yet
Lecture Notes
80 pages
8409 Statistics
No ratings yet
8409 Statistics
17 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
MMW Nursing
No ratings yet
MMW Nursing
23 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Cpe 106 Stat
No ratings yet
Cpe 106 Stat
5 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
Descriptive Stastistics
No ratings yet
Descriptive Stastistics
10 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
Statistics
No ratings yet
Statistics
13 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Raja Daniyal (0000242740) 8614 - Assignment 1
No ratings yet
Raja Daniyal (0000242740) 8614 - Assignment 1
30 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
No ratings yet
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
9 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
9 pages
Lecture Series 1 Linear Random and Fixed Effect Models and Their (Less) Recent Extensions
No ratings yet
Lecture Series 1 Linear Random and Fixed Effect Models and Their (Less) Recent Extensions
62 pages
Statistics and Its Types (v1.0)
No ratings yet
Statistics and Its Types (v1.0)
6 pages
HHSM ZG513
0% (1)
HHSM ZG513
3 pages
Chapter 1 Review of Elementary Statistics
No ratings yet
Chapter 1 Review of Elementary Statistics
5 pages
Python For Data Science - Unit 6 - Week 4
No ratings yet
Python For Data Science - Unit 6 - Week 4
5 pages
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
No ratings yet
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
4 pages
Decsci Reviewer CHAPTER 1: Statistics and Data
No ratings yet
Decsci Reviewer CHAPTER 1: Statistics and Data
7 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
BusiForecasting Qs
No ratings yet
BusiForecasting Qs
13 pages
Advanced Statistics Assignment: Business Report (PGP - DSBA)
No ratings yet
Advanced Statistics Assignment: Business Report (PGP - DSBA)
23 pages
Ejercicios Simulacion
No ratings yet
Ejercicios Simulacion
7 pages
SP Iii-37
No ratings yet
SP Iii-37
5 pages
5-Introduction To The Normal Distribution (Bell Curve)
No ratings yet
5-Introduction To The Normal Distribution (Bell Curve)
9 pages
Chapter 10-Advanced Control Charting Techniques
No ratings yet
Chapter 10-Advanced Control Charting Techniques
30 pages
Demand Analysis and Time Series Forecast: - Arkaprava Ghosh - Vikash Prakash Rajdev
No ratings yet
Demand Analysis and Time Series Forecast: - Arkaprava Ghosh - Vikash Prakash Rajdev
14 pages
Module 16 Areas Under A Normal Distribution
No ratings yet
Module 16 Areas Under A Normal Distribution
6 pages
Applied Natural Language Processing: Barbara Rosario
No ratings yet
Applied Natural Language Processing: Barbara Rosario
39 pages
Math277 Assignment2 2024 Solution Markersguide
No ratings yet
Math277 Assignment2 2024 Solution Markersguide
11 pages
Modeling Uncertainty P2
No ratings yet
Modeling Uncertainty P2
11 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Handbook of Applied Econometrics and Statistical Inference 1st Edition Viktor K. Jirsa
No ratings yet
Handbook of Applied Econometrics and Statistical Inference 1st Edition Viktor K. Jirsa
84 pages
Complete Download Environmental Statistics With S Plus 1st Edition Steven P. Millard PDF All Chapters
100% (3)
Complete Download Environmental Statistics With S Plus 1st Edition Steven P. Millard PDF All Chapters
55 pages
DA Long Questions (12!11!24)
No ratings yet
DA Long Questions (12!11!24)
10 pages
Erlag B, C
No ratings yet
Erlag B, C
3 pages
The Wald Tests For Testing Hypotheses: Eco321: Econometrics
No ratings yet
The Wald Tests For Testing Hypotheses: Eco321: Econometrics
11 pages
1.2 - Stationary Processes
No ratings yet
1.2 - Stationary Processes
17 pages
Two Stage Sampling 1
No ratings yet
Two Stage Sampling 1
33 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
4 pages
Eco 270
No ratings yet
Eco 270
9 pages
Finish Analisis Soal
No ratings yet
Finish Analisis Soal
7 pages
Two Sections Were Given Introduction To Statistics Examinations. The Following Information Was
No ratings yet
Two Sections Were Given Introduction To Statistics Examinations. The Following Information Was
2 pages
Homework 4
No ratings yet
Homework 4
3 pages
EBSCO FullText 2024 06 21
No ratings yet
EBSCO FullText 2024 06 21
4 pages
Outline For ASM2
No ratings yet
Outline For ASM2
4 pages