0% found this document useful (0 votes)
6 views

Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CL202: Introduction to Data Analysis

MB+SCP
Mani Bhushan, Sachin Patwardhan
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,[email protected]

Instructor: Sharad Bhartiya

Spring 2021

MB+SCP (IIT Bombay) CL202 Spring 2021 1 / 42


Today’s lecture:

Sampling Statistics and their Distributions


Chapter 6 of Ross

MB+SCP (IIT Bombay) CL202 Spring 2021 2 / 42


Probability and Statistics (1)

So far, looked at probability.


I Start with a “model” (such as probability density function or
distribution function) describing what events we think will occur
and with what likelihoods.
I Typically about prediction: looking forward. Predict what data
will we observe from given rules.
I Probability is mathematics: no ambiguity.
Statistics
I Statistics typically about looking backwards: extract rules
(unknown parameters) from given data.
I An art to some extent: how to sumamrize data (mean/median)
or visualize data (histogram, bar plot, pie chart, etc.), what is a
good hypothesis to test at what significance level, etc.

MB+SCP (IIT Bombay) CL202 Spring 2021 3 / 42


Probability and Statistics (2)

Probability and Statistics are linked


I Use of statistics to test hypotheses on population parameters,
extract relationships, etc.
I Use of the extracted models to predict future.
I Observe data (new statistics), refine probability models,
continue

MB+SCP (IIT Bombay) CL202 Spring 2021 4 / 42


Statistics, Population and Sample

Statistics deals with drawing conclusions from observed data.


Population: Large (infinite) collection of items that have
measurable values associated with them.
Sample: A finite collection of items from the population that is
observed.
Problem: Use sample to draw inferences about population.

MB+SCP (IIT Bombay) CL202 Spring 2021 5 / 42


Populations and samples

We sample to draw some conclusions about an entire population.


Examples: Exit polls, census.
We assume that the population can be described by a probability
distribution.
We assume that the samples give us measurements following
this distribution.

MB+SCP (IIT Bombay) CL202 Spring 2021 6 / 42


Formally...

If X1 , ..., Xn are independent random variables having a common


distribution F i.e. X1 , ..., Xn are IID (independently and
identically distributed), then we say that they constitute a
sample (or a random sample) from the distribution F .
In most applications, F not completely known and problem is to
use samples to infer F .
Parameteric inference: F specified in terms of unknown
parameters, such as height is gaussian with unknown mean and
variance,
Non-parameteric inference: nothing known about the form of F .
Parameteric inference in this course.

MB+SCP (IIT Bombay) CL202 Spring 2021 7 / 42


Distributions and Parameteric Estimation

We wish to infer the parameters of a population distribution.


I mean, variance, proportion etc.
We use a statistic derived from the sample data.
I Sample mean, sample variance etc.
A statistic is a random variable.
I What distribution does it follow?
I What are the parameters of its distribution?

MB+SCP (IIT Bombay) CL202 Spring 2021 8 / 42


Notation

Population parameters are typically in Greek.


I µ = population mean.
I σ 2 = population variance.
The corresponding statistic is not in Greek
I X̄ = sample mean.
I S 2 = sample variance.
We can also denote an estimate of the parameter θ as θ̂.
Regression:
I You want: y = αx + β.
I You get: y = ax + b.

MB+SCP (IIT Bombay) CL202 Spring 2021 9 / 42


What can we say about Sample Mean
Consider a population with mean µ and variance σ 2 .
We have n random samples X1 , X2 , . . ., Xn from this population.
Define sample mean as:
X1 + · · · + Xn
X̄ =
n

X̄ is a random variable since it is a function of the random


variables X1 , ..., Xn .
What do we expect from X̄ ?
E [Xi ] = µ var (Xi ) = σ 2

 
X1 + · · · + Xn E [X1 ] + · · · + E [Xn ]
E [X̄ ] = E =
n n

= =µ
n
MB+SCP (IIT Bombay) CL202 Spring 2021 10 / 42
Why do we infer µ using the sample mean?

E [X̄ ] = µ

X̄ is an unbiased estimator of µ.
For a statistic θ̂ to be an unbiased estimator of θ,

E [θ̂] = θ

MB+SCP (IIT Bombay) CL202 Spring 2021 11 / 42


What is the variance of X̄ ?

We had var (Xi ) = σ 2 .


 
X1 + · · · + Xn
var (X̄ ) = var
n
1
= 2 (var (X1 + · · · + Xn ))
n
1
= 2 (var (X1 ) + · · · + var (Xn )) Independence!
n
1
= 2 nσ 2
n
σ2
=
n

MB+SCP (IIT Bombay) CL202 Spring 2021 12 / 42


Xi versus X̄ ?

E [Xi ] = µ var (Xi ) = σ 2

σ2
E [X̄ ] = µ var (X̄ ) =
n

Variance of sample mean decreases with increasing n.


Sample more and gain accuracy when using X̄ as an estimate of
µ.

MB+SCP (IIT Bombay) CL202 Spring 2021 13 / 42


Recall Chebyshev’s Inequality

σ2
P (|X − µ| ≥ k) ≤
k2
1
P (|X − µ| ≥ kσ) ≤ 2
k
Allows derivation of probability bounds when only mean and variance
are known.

MB+SCP (IIT Bombay) CL202 Spring 2021 14 / 42


Weak Law of Large Numbers

Let X1 , X2 , ..., Xn be a sequence of independently and identically


distributed random variables, with E [Xi ] = µ, i = 1, 2, ..., n.
Then, for any  > 0
 
X1 + X2 + ... + Xn
P − µ >  → 0 as n → ∞
n

Follows from Chebyshev’s inequality.


Shows why sample average X̄ = n1 ni=1 Xi is a good estimate of
P
µ.

MB+SCP (IIT Bombay) CL202 Spring 2021 15 / 42


Sample Average

Sample average converges to mean when n increases.


What is the distribution of sample average?

MB+SCP (IIT Bombay) CL202 Spring 2021 16 / 42


The Central Limit Theorem

Theorem
Let X1 , X2 , ..., Xn be a sequence of independent and identically
distributed random variables each having mean µ and variance σ 2 .
Then for large n, the distribution of
X1 + X2 + .... + Xn
is approximately normal with mean nµ and variance nσ 2 .

i.e. for large n


(X1 + · · · + Xn ) ∼ N (nµ, nσ 2 )
 
(X1 + · · · + Xn ) − nµ
P √ <x ≈ P(Z < x)
σ n
with Z being a standard normal random variable.

MB+SCP (IIT Bombay) CL202 Spring 2021 17 / 42


Central Limit Theorem: History

One of the most powerful results in probability


Proposed in 1733 by French mathematician A. deMoivre.
Forgotten until Laplace published it in 1812: used normal to
approximate binomial.
In 1901 Lyapunov defined it in general terms and proved it
formally.

MB+SCP (IIT Bombay) CL202 Spring 2021 18 / 42


The Central Limit Theorem
Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2021 19 / 42


The Central Limit Theorem
Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2021 20 / 42


The Central Limit Theorem
Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2021 21 / 42


The Central Limit Theorem
Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2021 22 / 42


The Central Limit Theorem
Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2021 23 / 42


Central Limit Theorem (CLT) and the Binomial
Let X = X1 + · · · + Xn , where Xi is a Bernoulli variable (1 if
success and 0 if failure; probability of success p).
E [Xi ] = p var (Xi ) = p(1 − p)
E [X ] = np var (X ) = np(1 − p)

CLT suggests that for large n


X − np
p ∼ N (0, 1)
np(1 − p)

Rule of thumb: The Binomial can be approximated well with a


normal if
np(1 − p) ≥ 10

MB+SCP (IIT Bombay) CL202 Spring 2021 24 / 42


Example 6.3c

Q Ideal size of a first year class at a college is 150 students. The


college, knowing from past experience that on the average only
30% of those accepted for admission will actually attend, uses a
policy of approving the applications of 450 students. Compute
the probability that more than 150 first year students attend this
college.
A X is the number of students that attend. Then X is a binomial
RV with n = 450 and p = 0.3 assuming each student
independently decides whether to attend or not.
Note X (binomial) is discrete while normal is continuous. Thus

P(X = i) = P(i − 0.5 < X < i + 0.5)

MB+SCP (IIT Bombay) CL202 Spring 2021 25 / 42


Example (continued)

Thus,
!
X − (450)(0.3) 150.5 − (450)(0.3
P(X > 150.5) = P p ≥ p
(450)(0.3)(0.7) (450)(0.3)(0.7)

≈ P(Z ≥ 1.59) = 0.06

MB+SCP (IIT Bombay) CL202 Spring 2021 26 / 42


Approximate Distribution of Sample Mean
Q Let X1 , ..., Xn be a sample from a population having mean µ and
variance σ 2 . What is the distribution of sample mean?
Pn
Xi
X̄ = i=1
n

A Using CLT, distribution of ni=1 Xi is approximately normal


P
when n is large.
Then distribution of X̄ is also normal since constant multiple of
a normal RV is also normal RV, with
E [X̄ ] = µ; var (X̄ ) = σ 2 /n
Hence,
X̄ − µ

σ/ n
has a standard normal distribution.
MB+SCP (IIT Bombay) CL202 Spring 2021 27 / 42
CLT and Sample Size for X̄ to be Normal

CLT does not tell us how large sample size n needs to be for the
normal approximation of X̄ to be valid.
n depends on population distribution of the sample data.
For binomial np(1 − p) ≥ 10, for normal any n ≥ 1 is ok.
Rule of thumb: sample size n ≥ 30 works for almost all
distributions, i.e. no matter how nonnormal the underlying
population is, the sample mean of a sample of size atleast 30
will be approximately normal.
In most cases, normal approximation will be valid for much
smaller sample sizes.

MB+SCP (IIT Bombay) CL202 Spring 2021 28 / 42


Densities of Sample Means of a Normal Population
 
1
If X ∼ N (0, 1), then X̄ ∼ N 0,
n

MB+SCP (IIT Bombay) CL202 Spring 2021 29 / 42


Sample variance

Let X1 , X2 , ..., Xn be a random sample from a distribution with


mean µ and variance σ 2 .
Let X̄ be the sample mean.
The statistic S 2 defined as:
Pn
2 (Xi − X̄ )2
S = i=1
n−1
is called the sample variance. It is a random variable.

S = S 2 is called the sample standard deviation.

MB+SCP (IIT Bombay) CL202 Spring 2021 30 / 42


E [S 2]?
We compute E [S 2 ] as follows:

(Xi − X̄ )2 = (Xi − µ + µ − X̄ )2
= (Xi − µ)2 + (µ − X̄ )2 + 2(Xi − µ)(µ − X̄ )
n
X Xn X n
2 2
(Xi − X̄ ) = (Xi − µ) + (µ − X̄ )2
i=1 i=1 i=1
n
X
+2 (Xi − µ)(µ − X̄ )
i=1

The middle term is ...


n
X
(µ − X̄ )2 = n(µ − X̄ )2
i=1

MB+SCP (IIT Bombay) CL202 Spring 2021 31 / 42


E [S 2] (Continued)

The last term is ...


n
X n
X
2 (Xi − µ)(µ − X̄ ) = − 2(X̄ − µ) (Xi − µ)
i=1 i=1
2
= − 2n(X̄ − µ)
n
X n
X
(Xi − X̄ )2 = (Xi − µ)2 + n(µ − X̄ )2 − 2n(X̄ − µ)2
i=1 i=1
n
X
= (Xi − µ)2 − n(X̄ − µ)2
i=1
Pn 2
Pn
− X̄ )
i=1 (Xi i=1 (Xi− µ)2 n(X̄ − µ)2
= −
n−1 n−1 n−1

MB+SCP (IIT Bombay) CL202 Spring 2021 32 / 42


E [S 2] (Continued)
We had
Pn Pn
2 − X̄ )2
i=1 (Xi i=1 (Xi− µ)2 n
S = = − (X̄ − µ)2
n−1 n−1 n−1

Taking expectations
Pn 2
 Pn 2

2 i=1 (Xi − X̄ ) i=1 (Xi − µ)
E [S ] =E =E −
n−1 n−1
 
n 2
E (X̄ − µ)
n−1

But,
Pn 2
σ2

(Xi − µ) n
E i=1
= σ 2 , E [(X̄ − µ)2 ] = var (X̄ ) =
n−1 n−1 n
MB+SCP (IIT Bombay) CL202 Spring 2021 33 / 42
E [S 2] (continued)

This implies
Pn
− X̄ )2 σ2

i=1 (Xi n 2
E = σ −
n−1 n−1 n−1
2
= σ

or
E [S 2 ] = σ 2

Therefore S 2 is an unbiased estimator of σ 2 .


This is the real reason for having n − 1 in the denominator in the
S 2 expression instead of n, and not really the degrees of freedom
argument given earlier.

MB+SCP (IIT Bombay) CL202 Spring 2021 34 / 42


Sampling Distributions from a Normal Population

Consider distribution of statistics (X̄ , S 2 ) obtained from samples from


a Normal population.
Theorem
If X1 , X2 , ..., Xn is a sample (IID) from a normal population having
mean µ and variance σ 2 , then X̄ and S 2 are independent random
variables, with X̄ being normal with mean µ and variance σ 2 /n and
(n − 1)S 2 /σ 2 being chi-square random variable with n − 1 degrees of
freedom.

X̄ is approximately normal due to CLT in general. However in


this case, it is exactly normal since sum of normal RVs is normal
(Show this using Moment Generating Function)?

MB+SCP (IIT Bombay) CL202 Spring 2021 35 / 42


Sampling Distributions from Normal Population
(Cont.)

(n − 1)S 2 /σ 2 is a chisquare with n − 1 degrees of freedom.


X̄ , S 2 are independent random variables.
Proof (non-rigourous) of last two statements in Ross.
Independence of X̄ and S 2 due to normal population, not true in
general.

MB+SCP (IIT Bombay) CL202 Spring 2021 36 / 42


The Chi-Square Distribution
If Z1 , Z2 , ..., Zn are independent standard normal random
variables, then X , defined as
X = Z12 + Z22 + ... + Zn2
is said to have a chi-square distribution with n degrees of
freedom or
X ∼ χ2n

Let X1 , X2 be independent chi-square random variables with n1


and n2 degrees of freedom, respectively. Then X1 + X2 is
chi-square with n1 + n2 degrees of freedom.
For X a chi-square RV with n degrees of freedom, quantity χ2α,n
is defined such that
P(X ≥ χ2α,n ) = α

MB+SCP (IIT Bombay) CL202 2 Spring 2021 37 / 42


Chi-squared PDF Sketch

MB+SCP (IIT Bombay) CL202 Spring 2021 38 / 42


Chi-squared RV

Density function of chi-squared RV involves gamma function.


(without proof) For a chi-squared RV with n degrees of freedom:
E [X ] = n, Var(X ) = 2n.

MB+SCP (IIT Bombay) CL202 Spring 2021 39 / 42


Example 6.5a to illustrate use of S 2 distribution

Q The time it takes for a processor to carry out a particular


computation is normally distributed with mean 20 seconds and
standard deviation 3 seconds. If a sample of 15 such
computations is observed, what is the probability that the
sample variance is greater than 12?
A n = 15, σ 2 = 32 = 9. Then, P(S 2 > 12) =
 
14
P 14S /9 > 12 = P(χ214 > 18.67)
2
9

MB+SCP (IIT Bombay) CL202 Spring 2021 40 / 42


Summary

For any sample: E [X̄ ] = µ, E [S 2 ] = σ 2


CLT: sum of IID RVs is approximately normal. As a result, X̄ is
approximately normal with mean µ and variance σ 2 /n.
When sample from normal population, then X̄ is exactly normal
and (n − 1)S 2 /σ 2 is a chi-square RV with n − 1 degrees of
freedom. Further, X̄ , S 2 are independent.

MB+SCP (IIT Bombay) CL202 Spring 2021 41 / 42


THANK YOU

MB+SCP (IIT Bombay) CL202 Spring 2021 42 / 42

You might also like