0% found this document useful (0 votes)
5 views10 pages

Lecture 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Lecture 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Population vs Sample

Population Sample
Sample, a subset
Population
of the population Characteristics are called Characteristics are called
“parameters” “statistics”

Complete set Subset of the population that


must be representative of the
population
Parameters are the true Statistics comes with a
By analyzing statistics from a sample, value margin of errors, which is
it is possible to draw conclusions half the size of the
confidence interval
about the larger population and
estimate its parameters →
inferential statistics

© Nicolas Navet University of Luxembourg 3


Confidence intervals
- Confidence intervals (CI) quantify the uncertainty there is about a summary
statistics (mean, median, variance, etc) that is due to the randomness of the
sampling (from the population):
- “Based on a poll, we are 95% confident that the approval rating for the current
Prime Minister is from 41.5% to 46.5%."
- Interpretation: "We are 'some level of percent confident' (=confidence level)
that the statistic of interest is within 'lower bound to upper bound‘
(=confidence interval)”
- The confidence level represents the probability that the unknown parameter
(=value for the entire population) lies in the stated interval, the degree of
certainty. A CI always come with a confidence level.
- 95% and 99% are common choices for the confidence level
- There is a confidence interval for every summary stat: median, mean, quartile,
standard deviation

© Nicolas Navet University of Luxembourg 4


We randomly draw samples Visual Illustration • Samples estimates in red
from a population and • CI width in green
calculate the 80% confidence
interval (CI). On average, 8
out of 10 times, the CI will The randomness comes from the
include the true value of the sample chosen NOT the parameter
statistic of interest. of interest (𝜋 here) which has a
fixed but unknown value.
✓ Larger samples will
reduce the width of the
confidence interval
✓ Higher confidence levels
will increase the width of
the confidence intervals
Figure from https://onlinecourses.science.psu.edu/stat500/print/book/export/html/29/

© Nicolas Navet University of Luxembourg 5


Margin of error vs confidence interval
- Margin of error (MoE) is half the width of the CI for a statistic, i.e.
CI = [Sample Estimate-MoE, Sample Estimate+MoE]
- MoE can be expressed as an "absolute" quantity, e.g. 3 people.
- MoE can also be expressed as a percentage: "percent relative" margin of error is
10% if the value of the statistic of interest is 30.
- Example: "The current approval rating for the president is 44% with a 95%
(confidence level) margin of error of 3%."

© Nicolas Navet University of Luxembourg 6


Higher confidence levels → larger confidence intervals

90% 95%

Each point is calculated with


99% a sample of size 100

© Nicolas Navet University of Luxembourg 7


Smaller samples → larger confidence intervals

99%
confidence
level

Here, each point is calculated with


a sample of size 30 vs 100 in the previous slide
© Nicolas Navet University of Luxembourg 8
Determining the CI for the mean assuming
i.i.d.-ness and normal distribution
Here is the plot of the standard normal distribution (loi normale centrée
réduite), which has a mean of 0 and a standard deviation of 1. The curve
represents the probability density function (PDF) for this distribution.

© Nicolas Navet University of Luxembourg 9


The i.i.d. assumption
- Assumption made to calculate CIs with the standard formulas: the data in the
population are independently and identically distributed (i.i.d.)
- In principle, i.i.d.-ness is a property of a stochastic model not of the collected
data. It means that all data possess “the same properties” as if they were
generated by a random generator using the same probability distribution:
- Independence: value 𝑋𝑛 does not depend on 𝑋𝑛−1 , 𝑋𝑛−2 , etc
- Identically distributed: the probability distribution is the same for all random
draws
- How do I know if the i.i.d. assumption is valid? autocorrelation plot (linear serial
relationships), “turning point” and Ljung-box tests (not very powerful), BDS test
(nonlinear relationships), stationarity test (e.g., mean and variance do not
change over time), etc
- In practice, data coming from high resolution measurements are frequently
positively correlated…

© Nicolas Navet University of Luxembourg 10


Confidence intervals for the mean
- Formula for the confidence interval of the sample mean, assuming independent and
identically distributed (i.i.d.) and normally distributed data - 𝛾 is the confidence level
From [1]

where 𝑆𝑛 is the standard deviation of the sample, 𝑛 is the size of the sample and 𝜂 is a
constant that depends on the confidence level 𝛾

✓ Standard deviation is a multiplying factor of the length of the CI


✓ The length of the CI decreases with 1 / sqrt(n): to decrease the width of the CI by a factor
2, the number of data must be increased by a factor 4.
✓ This formula is often applied in practice, event though the Gaussian assumption will not
be met in many cases ..
✓ There are visual tests (e.g., normal QQplots) and statistical tests (e.g., Jarque Bera) to test
whether the normal assumption is valid

© Nicolas Navet University of Luxembourg 11


Determining confidence intervals

𝑆𝑛 is the standard deviation of the sample, 𝑛 is the size of the sample,


𝜂 is a constant that depends on the confidence level 𝛾
Conf. 𝜂
level
1. Derive the formula to calculate the CI of the
mean at the 90% confidence level?
2. CI of at 90% level of mean of data set:
{1.0, -1.1, 1.4, 0.2, -0.5,
-0.9, -0.8, 0.4, -0.3, -0.3, -0.2, 0.0} ?

“We can be 90% confident that the


population mean falls between
-0.44 and 0.26”
© Nicolas Navet University of Luxembourg 12

You might also like