0% found this document useful (0 votes)
21 views

09 Inference - Slides Web

This document provides an introduction to statistical inference. It defines key terms related to inferential statistics and discusses concepts such as point estimation, confidence intervals, hypothesis testing, and how statistics are used to make statistical inferences about populations based on sample data.

Uploaded by

Godslove Oluwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

09 Inference - Slides Web

This document provides an introduction to statistical inference. It defines key terms related to inferential statistics and discusses concepts such as point estimation, confidence intervals, hypothesis testing, and how statistics are used to make statistical inferences about populations based on sample data.

Uploaded by

Godslove Oluwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Introduction to Statistical Inference

Edwin Leuven
Introduction

I Define key terms that are associated with inferential statistics.


I Revise concepts related to random variables, the sampling
distribution and the Central Limit Theorem.

2/39
Introduction

Until now we’ve mostly dealt with descriptive statistics and with
probability.
In descriptive statistics one investigates the characteristics of the
data

I using graphical tools and numerical summaries

The frame of reference is the observed data


In probability, the frame of reference is all data sets that could have
potentially emerged from a population

3/39
Introduction

The aim of statistical inference is to learn about the population


using the observed data
This involves:

I computing something with the data


I a statistic: function of data
I interpret the result
I in probabilistic terms: sampling distribution of statistic

4/39
Introduction

Probability

Population Sample (Data)


fX (x ) (x1 , . . . , xn )

Parameter Statistic
x̄ = n1 ni=1 xi
P
E [X ] = µ

Inference

5/39
Point estimation
We want to estimate a population parameter using the observed
data.

I f.e. some measure of variation, an average, min, max, quantile,


etc.

Point estimation attempts to obtain a best guess for the value of


that parameter.
An estimator is a statistic (function of data) that produces such a
guess.
We usually mean by “best” an estimator whose sampling
distribution is more concentrated about the population parameter
value compared to other estimators.
Hence, the choice of a specific statistic as an estimator depends on
the probabilistic characteristics of this statistic in the context of the
sampling distribution.
6/39
Confidence Interval

We can also quantify the uncertainty (sampling distribution) of our


point estimate.
One way of doing this is by constructing an interval that is likely to
contain the population parameter.
One such an interval, which is computed on the basis of the data, is
called a confidence interval.
The sampling probability that the confidence interval will indeed
contain the parameter value is called the confidence level.
We construct confidence intervals for a given confidence level.

7/39
Hypothesis Testing

The scientific paradigm involves the proposal of new theories that


presumably provide a better description of the laws of Nature.
If the empirical evidence is inconsistent with the predictions of the
old theory but not with those of the new theory

I then the old theory is rejected in favor of the new one.


I otherwise, the old theory maintains its status.

Statistical hypothesis testing is a formal method for determining


which of the two hypothesis should prevail that uses this paradigm.

8/39
Statistical hypothesis testing

Each of the two hypothesis, the old and the new, predicts a different
distribution for the empirical measurements.
In order to decide which of the distributions is more in tune with the
data a statistic is computed.
This statistic t is called the test statistic.
A threshold c is set and the old theory is reject if t > c
Hypothesis testing consists in asking a binary question about the
sampling distribution of t

9/39
Statistical hypothesis testing

This decision rule is not error proof, since the test statistic may fall
by chance on the wrong side of the threshold.
Suppose we know the sampling distribution of the test statistic t
We can then set the probability of making an error to a given level
by setting c
The probability of erroneously rejecting the currently accepted
theory (the old one) is called the significance level of the test.
The threshold is selected in order to assure a small enough
significance level.

10/39
Multiple measurements

The method of testing hypothesis is also applied in other practical


settings where it is required to make decisions.
Consider a random trial of a new treatment to a medical condition
where the

I treated get the new treatment


I controls get the old treatment

and measure their response


We now have 2 measurements that we can compare.
We will use statistical inference to make a decision about whether
the new treatment is better.

11/39
Statistics

Statistical inferences, be it point estimation, confidence intervals, or


hypothesis tests, are based on statistics computed from the data.
A statistic is a formula which is applied to the data
and we think of it as a statistical summary of the data
Examples of statistics are

I the sample average and


I the sample standard deviation

For a given dataset a statistic has a single numerical value.


it will be different for a different random sample!
The statistic is therefore a random variable

12/39
Statistics

It is important to distinguish between

1. the statistic (a random variable)


2. the realisation of the statistic for a given sample (a number)

we therefore denote the statistic with capitals, f.e. the sample mean:
1 Pn
I X̄ = n i=1 Xi

and the realisation of the statistic with small letters:


1 Pn
I x̄ = n i=1 xi

13/39
Example: Polling

14/39
Example: Polling

Imagine we want to predict whether the left block or the right block
will get a majority in parliament
Key quantities:

I N = 4,166,612 - Population
I p = (# people who support the right) / N
I 1 − p = (# people who support the left) / N

We can ask the following questions:

1. What is p?
2. Is p > 0.5?
3. We estimate p but are we sure?

15/39
Example: Polling
We poll a random sample of n = 1,000 people from the population
without replacement:

I choose person 1 at random from N, choose person 2 at random


from N-1 remaining, etc.
or, choose a random set of n people from all Nn = n!(N−n)!
N!

I
possible sets

Let (
1 if person i support the right
Xi =
0 if person i support the left
and denote our data by x1 , . . . , xn
Then we can estimate p by

p̂ = (x1 + . . . + xn )/n

16/39
Example: Polling

To construct the poll we randomly sampled the population


With a random sample each of the n people is equally likely to be
the ith person, therefore

E [Xi ] = 1 · p + 0 · (1 − p) = p

and therefore

E [p̂] = E [(X1 + . . . + Xn )/n]


= (E [X1 ] + . . . + E [Xn ])/n = p

The “average” value of p̂ is p, and we say that p̂ is unbiased


Unbiasedness refers to the average error over repeated sampling,
and not the error for the observed data!
17/39
Example: Polling

Say 540 support the right, so p̂ = 0.54


Does this mean that in the population:

I p = 0.54?
I p > 0.5?

The data are a realization of a random sample and p̂ is therefore a


random variable!
For a given sample we will therefore have estimation error

estimation error = p̂ − p 6= 0

which comes from the difference between our sample and the
population

18/39
Example: Polling

When sampling with replacement the Xi are independent, and


p(1−p)
I Var [p̂] = n

When sampling without replacement the Xi are not independent


N1 − 1 N1
= Pr(Xi = 1|Xj = 1) 6= Pr (Xi = 1|Xj = 0) =
N −1 N −1
and we can show that
 
p(1−p) n−1
I Var [p̂] = n 1− N−1

For N = 4, 166, 612, n = 1, 000, and p = 0.54, the standard


deviation of p̂ ≈ 0.016.
But what is the distribution of p̂?
19/39
The Sampling Distribution

Statistics vary from sample to sample


The sampling distribution of a statistic

I is the nature of this variability


I can sometimes be determined and often approximated

The distribution of the values we get when computing a statistic in


(infinitely) many random samples is called the sample distribution of
that statistic

20/39
The Sampling Distribution

We can sample from

I population
I eligible voters in norway today
I model (theoretical population)
I Pr(vote right block) = p

The sampling distribution of a statistic depends on the population


distribution of values of the variables that are used to compute the
statistic.

21/39
Sampling Distribution of Statistics

Theoretical models describe the distribution of a measurement as a


function of one or more parameters.
For example,

I in n trials with succes probability p, the total number of


successes follows a Binomial distribution with parameters n and
p
I if an event happens at rate λ per unit time then the probability
that k events occur in a time interval with length t follows a
Poission distribution with parameters λt and k

22/39
Sampling Distribution of Statistics

More generally the sampling distribution of a statistic depends on

I the sample size


I the sampling distribution of the data used to construct the
statistic

can be complicated!
We can sometimes learn about the sampling distribution of a
statistic by

I Deriving the finite sample distribution


I Approximation with a Normal distribution in large samples
I Approximation through numerical simulation

23/39
Finite sample distributions
Sometimes we can derive the finite sample distribution of a statistic
Let the fraction of people voting right in the population be p
Because we know the distribution of the data (up to the unknown
parameter p) we can derive the sampling distribution
In a random sample of size n the probability of observing k people
voting on the right can be derived and follows a binomial distribution
!
n k
Pr(X = k) = p (1 − p)n−k
k

This depends on p which is unknown.


This approach is however often not feasible
The statistic may be complicated, depend on different variables, the
population distribution of these variables is unknown
24/39
Theoretical Distributions of Observations (Models)

Distribution Sample Space f (x )


n k n−k
Binomial 1, . . . , n k p (1 − p)
Poisson 1, 2, . . . λk exp(−λ)/k!
Uniform [a, b] 1/(b − a)
Exponential [0, ∞) λ exp(−λx )
Normal (−∞, ∞) √ 1 exp(− 1 ( x −µ )2 )
2πσ 2 σ

Distribution E [X ] Var (X ) R
Binomial np np(1 − p) d,p,q,rbinom
Poisson λ λ d,p,q,rpoisson
1 1 2
Uniform 2 (a + b) 12 (b − a) d,p,q,runif
Exponential λ−1 λ −2 d,p,q,rexp
Normal µ σ2 d,p,q,rnorm

25/39
Example: Polling
hist(
replicate(
10000,mean(rbinom(1000, 1, .54)))
, main="", xlab="p_hat",prob=TRUE,breaks=50)
25
20
Density

15
10
5
0

0.48 0.50 0.52 0.54 0.56 0.58 0.60


26/39
The Normal Approximation

In general, the sampling distribution of a statistic is not the same as


the sampling distribution of the measurements from which it is
computed.
If the statistic is

1. (a function of) a sample average and


2. the sample is large

then we can often approximate the sampling distribution with a


Normal distribution

27/39
Example: Polling

In the graph p̂ looked like it had a Normal distribution with mean


0.54 and s.d. 0.16
If N  n then Xi are approximately independent, and if n is large
then


n(p̂ − p) ∼ N(0, p(1 − p))
or equivalently
p(1 − p)
p̂ ∼ N(p, )
n
by the Central Limit Theorem

28/39
Example: Polling

curve(dnorm(x, mean=.54, sd=0.016),


col="darkblue", lwd=2, add=TRUE, yaxt="n")
25
20
Density

15
10
5
0

0.48 0.50 0.52 0.54 0.56 0.58 0.60

p_hat
29/39
Approximation through numerical simulation

Computerized simulations can be carried out to approximate


sampling distributions.
With a model we can draw many random samples, compute the
statistic, and characterize it’s sampling distribution.
Assume price ∼ Expontential(λ)
Consider samples of size n = 201
E [price] = λ−1 and Var [price] = λ−2
and therefore
q
Var (price) = (1/λ2 )/201 ≈ 0.0705/λ

30/39
Approximation through numerical simulation

Remember that 95% of the probability density of a normal


distribution is within 1.96 s.d. of its mean.
The Normal approximation for the sampling distribution of the
average price suggests


1/λ ± 1.96 · 1/(λ n)

should contain 95% of the distribution.

31/39
Approximation through numerical simulation

We may use simulations in order to validate this approximation


Assume λ = 1/12, 000

X.bar = replicate(10^5,mean(rexp(201,1/12000)))
mean(abs(X.bar-12000) <= 1.96*0.0705*12000)

## [1] 0.95173

Which shows that the Normal approximation is adequate in this


example
How about other values of n or λ?

32/39
Approximation through numerical simulation

Simulations may also be used in order to compute probabilities in


cases where the Normal approximation does not hold.
Consider the following statistic

(min(xi ) + max(xi ))/2

where Xi ∼ Uniform(3, 7) and n = 100


What interval contains 95% of the observations?

33/39
Approximation through numerical simulation
Let us carry out the simulation that produces an approximation of
the central region that contains 95% of the sampling distribution of
the mid-range statistic for the Uniform distribution:

mid.range <- rep(0,10^5)


for(i in 1:10^5) {
X <- runif(100,3,7)
mid.range[i] <- (max(X)+min(X))/2
}
quantile(mid.range,c(0.025,0.975))

## 2.5% 97.5%
## 4.9409107 5.0591218

Observe that (approximately) 95% of the sampling distribution of


the statistic are in the range [4.941680, 5.059004].
34/39
Approximation through numerical simulation

Simulations can be used in order to compute any numerical


summary of the sampling distribution of a statistic
To obtain the expectation and the standard deviation of the
mid-range statistic of a sample of 100 observations from the
Uniform(3, 7) distribution:

mean(mid.range)

## [1] 4.9998949

sd(mid.range)

## [1] 0.027876151

35/39
Approximation through numerical simulation

Computerized simulations can be carried out to approximate


sampling distributions.

1. draw a random sample of size n with replacement from our data


2. compute our statistic
3. do 1. & 2. many times

The resulting distribution of statistics across our resamples is an


approximation of the sampling distribution of our statistic
The idea is that a random sample of a random sample from the
population, is again a random sample of the population
This is called the bootstrap and computes the sampling distribution
without a model!

36/39
Approximation through numerical simulation

n = 1000
data = rbinom(n, 1, .54) # true distr, usually unknown
estimates = rep(0,999)
for(i in 1:999) {
id = sample(1:n, n, replace=T)
estimates[i] = mean(data[id])
}
sd(estimates)

## [1] 0.015946413

sqrt(.54*(1-.54)/1000) # true value, usually unknown

## [1] 0.015760711
37/39
Summary

Today we looked at the elements of statistical inference

I Estimation:
I Determining the distribution, or some characteristic of it.
(What is our best guess for p?)
I Confidence intervals:
I Quantifying the uncertainty of our estimate. (What is a range
of values to which we’re reasonably sure p belongs?)
I Hypothesis testing:
I Asking a binary question about the distribution. (Is p > 0.5?)

38/39
Summary

In statistical inference we think of data as a realization of a random


process
There are many reasons why we think of our data as (ex-ante)
random:

1. We introduced randomness in our data collection (random


sampling, or random assigning treatment)
2. We are actually studying a random phenomenon (coin tosses or
dice rolls)
3. We treat as random the part of our data that we don’t
understand (errors in measurements)

The coming weeks we will take a closer look at how this randomness
affects what we can learn about the population from the data

39/39

You might also like