0% found this document useful (0 votes)

7 views42 pages

Module 5

Uploaded by

iitbmail210020007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views42 pages

Module 5

Uploaded by

iitbmail210020007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

CL202: Introduction to Data Analysis

MB+SCP
Mani Bhushan, Sachin Patwardhan
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,[email protected]

Instructor: Sharad Bhartiya

Spring 2021

MB+SCP (IIT Bombay) CL202 Spring 2021 1 / 42

Today’s lecture:

Sampling Statistics and their Distributions

Chapter 6 of Ross

MB+SCP (IIT Bombay) CL202 Spring 2021 2 / 42

Probability and Statistics (1)

So far, looked at probability.

I Start with a “model” (such as probability density function or
distribution function) describing what events we think will occur
and with what likelihoods.
I Typically about prediction: looking forward. Predict what data
will we observe from given rules.
I Probability is mathematics: no ambiguity.
Statistics
I Statistics typically about looking backwards: extract rules
(unknown parameters) from given data.
I An art to some extent: how to sumamrize data (mean/median)
or visualize data (histogram, bar plot, pie chart, etc.), what is a
good hypothesis to test at what significance level, etc.

MB+SCP (IIT Bombay) CL202 Spring 2021 3 / 42

Probability and Statistics (2)

Probability and Statistics are linked

I Use of statistics to test hypotheses on population parameters,
extract relationships, etc.
I Use of the extracted models to predict future.
I Observe data (new statistics), refine probability models,
continue

MB+SCP (IIT Bombay) CL202 Spring 2021 4 / 42

Statistics, Population and Sample

Statistics deals with drawing conclusions from observed data.

Population: Large (infinite) collection of items that have
measurable values associated with them.
Sample: A finite collection of items from the population that is
observed.
Problem: Use sample to draw inferences about population.

MB+SCP (IIT Bombay) CL202 Spring 2021 5 / 42

Populations and samples

We sample to draw some conclusions about an entire population.

Examples: Exit polls, census.
We assume that the population can be described by a probability
distribution.
We assume that the samples give us measurements following
this distribution.

MB+SCP (IIT Bombay) CL202 Spring 2021 6 / 42

Formally...

If X1 , ..., Xn are independent random variables having a common

distribution F i.e. X1 , ..., Xn are IID (independently and
identically distributed), then we say that they constitute a
sample (or a random sample) from the distribution F .
In most applications, F not completely known and problem is to
use samples to infer F .
Parameteric inference: F specified in terms of unknown
parameters, such as height is gaussian with unknown mean and
variance,
Non-parameteric inference: nothing known about the form of F .
Parameteric inference in this course.

MB+SCP (IIT Bombay) CL202 Spring 2021 7 / 42

Distributions and Parameteric Estimation

We wish to infer the parameters of a population distribution.

I mean, variance, proportion etc.
We use a statistic derived from the sample data.
I Sample mean, sample variance etc.
A statistic is a random variable.
I What distribution does it follow?
I What are the parameters of its distribution?

MB+SCP (IIT Bombay) CL202 Spring 2021 8 / 42

Notation

Population parameters are typically in Greek.

I µ = population mean.
I σ 2 = population variance.
The corresponding statistic is not in Greek
I X̄ = sample mean.
I S 2 = sample variance.
We can also denote an estimate of the parameter θ as θ̂.
Regression:
I You want: y = αx + β.
I You get: y = ax + b.

MB+SCP (IIT Bombay) CL202 Spring 2021 9 / 42

What can we say about Sample Mean
Consider a population with mean µ and variance σ 2 .
We have n random samples X1 , X2 , . . ., Xn from this population.
Define sample mean as:
X1 + · · · + Xn
X̄ =
n

X̄ is a random variable since it is a function of the random

variables X1 , ..., Xn .
What do we expect from X̄ ?
E [Xi ] = µ var (Xi ) = σ 2

X1 + · · · + Xn E [X1 ] + · · · + E [Xn ]
E [X̄ ] = E =
n n
nµ
= =µ
n
MB+SCP (IIT Bombay) CL202 Spring 2021 10 / 42
Why do we infer µ using the sample mean?

E [X̄ ] = µ

X̄ is an unbiased estimator of µ.
For a statistic θ̂ to be an unbiased estimator of θ,

E [θ̂] = θ

MB+SCP (IIT Bombay) CL202 Spring 2021 11 / 42

What is the variance of X̄ ?

We had var (Xi ) = σ 2 .

X1 + · · · + Xn
var (X̄ ) = var
n
1
= 2 (var (X1 + · · · + Xn ))
n
1
= 2 (var (X1 ) + · · · + var (Xn )) Independence!
n
1
= 2 nσ 2
n
σ2
=
n

MB+SCP (IIT Bombay) CL202 Spring 2021 12 / 42

Xi versus X̄ ?

E [Xi ] = µ var (Xi ) = σ 2

σ2
E [X̄ ] = µ var (X̄ ) =
n

Variance of sample mean decreases with increasing n.

Sample more and gain accuracy when using X̄ as an estimate of
µ.

MB+SCP (IIT Bombay) CL202 Spring 2021 13 / 42

Recall Chebyshev’s Inequality

σ2
P (|X − µ| ≥ k) ≤
k2
1
P (|X − µ| ≥ kσ) ≤ 2
k
Allows derivation of probability bounds when only mean and variance
are known.

MB+SCP (IIT Bombay) CL202 Spring 2021 14 / 42

Weak Law of Large Numbers

Let X1 , X2 , ..., Xn be a sequence of independently and identically

distributed random variables, with E [Xi ] = µ, i = 1, 2, ..., n.
Then, for any > 0

X1 + X2 + ... + Xn
P − µ > → 0 as n → ∞
n

Follows from Chebyshev’s inequality.

Shows why sample average X̄ = n1 ni=1 Xi is a good estimate of
P
µ.

MB+SCP (IIT Bombay) CL202 Spring 2021 15 / 42

Sample Average

Sample average converges to mean when n increases.

What is the distribution of sample average?

MB+SCP (IIT Bombay) CL202 Spring 2021 16 / 42

The Central Limit Theorem

Theorem
Let X1 , X2 , ..., Xn be a sequence of independent and identically
distributed random variables each having mean µ and variance σ 2 .
Then for large n, the distribution of
X1 + X2 + .... + Xn
is approximately normal with mean nµ and variance nσ 2 .

i.e. for large n

(X1 + · · · + Xn ) ∼ N (nµ, nσ 2 )

(X1 + · · · + Xn ) − nµ
P √ <x ≈ P(Z < x)
σ n
with Z being a standard normal random variable.

MB+SCP (IIT Bombay) CL202 Spring 2021 17 / 42

Central Limit Theorem: History

One of the most powerful results in probability

Proposed in 1733 by French mathematician A. deMoivre.
Forgotten until Laplace published it in 1812: used normal to
approximate binomial.
In 1901 Lyapunov defined it in general terms and proved it
formally.