Lecture 03. Statistical Inference
Lecture 03. Statistical Inference
Example:
Let X equal the weight of a randomly selected infant. Assume X ~ N(3000, 1000).
- What is the probability that a randomly selected infant has weight below 3500?
- What is the probability that a randomly selected infant has weight above 5000?
- What is the probability that a randomly selected infant has weight between
2500 and 4000?
Normal Probabilities
Example:
Let X equal the weight of a randomly selected infant. Assume X ~ N(3000, 1000).
- What is the probability that a randomly selected infant has weight below 3500?
P(X ≤ 3500) = P(Z ≤ (3500-3000)/1000) = P(Z ≤ 0.5) = 0.6915
- What is the probability that a randomly selected infant has weight above 5000?
P(X ≥ 5000) = P(Z ≥ (5000-3000)/1000) = P(Z ≥ 2) = 1 - P(Z ≤ 2) = 0.0228
- What is the probability that a randomly selected infant has weight between
2500 and 4000?
P(2500 ≤ X ≤ 4000) = P(-0.5 ≤ Z ≤ 1) = P(Z ≤ 1) - P(Z ≤ -0.5) = 0.8413 - 0.3085
Population vs Sample
Sampling Distribution
Sampling Distribution
The distribution of the statistic for all possible samples randomly drawn from the
same population of a given sample size.
Example:
- Considering a population follows the normal distribution N(μ, σ^2)
- Repeatedly take samples of a given size from this population
- Calculate the mean for each sample – this statistic is called the sample mean
- The distribution of these means is "sampling distribution of the sample mean"
The Central Limit Theorem
Central Limit Theorem
If a random sample of size n is drawn from a any population with μ and σ, the
distribution of the sample mean X approaches a normal distribution with μ and
σ_x = σ/sqrt(n) (the standard error) as the sample size increases
X ~ N(μ, (σ^2)/n)
Example:
Within what interval would we expect GMAT sample means to fall for samples of n =
5 applicants? The population is approximately normal with parameters μ = 520.78
and σ = 86.80, so the predicted range for 95 percent of the sample means is
[ 520.78 - 1.96*86.80/√10, 520.78 + 1.96*86.80/√10]
Estimation
❖ Point Estimation
❖ Interval Estimation
❖ Mean (μ) vs Proportion (π)
■ With known σ
■ With unknown σ
➢ Difference in Mean (μ1 - μ2)
➢ Difference in Proportion (π1 - π2)
❖ Sample size
Point Estimation
Point Estimation
Point Estimation is a single statistic, determined from a sample, that is used to
estimate the corresponding population parameter.
Example:
A sample mean x̄ calculated from a random sample x1 , x2 , . . . , xn is a point
estimate of the unknown population mean μ.
Interval Estimation
Interval Estimation
An interval estimate is a range of values for a statistic which means a point estimate
plus an interval that expresses the uncertainty or variability associated with the
estimate
estimate ± (critical value of z or t) × (standard error)
Example:
Given a data set with the mean falls somewhere between 10 and 100 (10<μ<100).
Confidence Interval for Mean
Confidence Interval for Mean
A 100(1 − α)% confidence interval for µ, the population mean, is given by the
interval estimate
x̄ - z𝜶/2*σ/√n ≤ μ ≤ x̄ + z𝜶/2*σ/√n
when the population variance is known
Interpretation of CI
❖ In repeated sampling, 100(1 − α)% confidence interval is a range of values that
you can be 100(1 − α)% certain contains the true mean of the population
❖ This is not the same as a range that contains 95% of the values
Derivation of Confidence Interval (CI) for Mean
The confidence level (1 - 𝛂) indicates how confident we are that the population
mean lies within the indicated confidence interval
Example:
If confidence level is 0.95 then z𝜶/2 = 1.96.
We can say that we are 95% confident that
the population mean lies within the interval
x̄ - 1.96*σ/√n ≤ μ ≤ x̄ + 1.96*σ/√n
Summary of Confidence Interval (CI)
Summary of Confidence Interval (CI) for Mean
Where:
❖ p is the Sample proportion
❖ zα/2 is the Critical value for Confidence level (1 - α) in Standard normal table
❖ n is the Sample size
Sample size determination for a mean
Suppose we wish to estimate a population mean with a maximum allowable margin
of error of ± E.
How to Estimate σ?
❖ Take a Preliminary Sample
→ Take small sample to estimate σ
What if we don’t ❖ Assume Uniform Population
→ Estimate upper and lower limits a and b and set σ = [(b - a)2 / 12 ]1/2
know σ?
❖ Assume Normal Population
→ Estimate upper and lower bounds a and b, and set σ = (b - a) / 6
❖ Poisson Arrivals
→ In the special case when λ is a Poisson arrival rate, then σ = √ λ .
Sample size determination for a proportion
Suppose we wish to estimate a population mean with a maximum allowable margin
of error of ± E.
How to Estimate π ?
What if we don’t ❖ Assume that π = 0.5
know π? ❖ Take a Preliminary Sample
→ Take small sample to estimate σ
❖ Use a Prior Sample or Historical Data
Type I Error & Type II Error
H0 is True H0 is False
Example:
Let X equal the weight of a randomly selected 10 infants with the sample mean is 2500 grams. The population
follow normal distribution with standard deviation is 1000 grams
Question: Is the mean birth weight in this population different from 3000 grams?
Answer: With 95% confidence, we have:
x̄ - 1.96*σ/√n ≤ μ ≤ x̄ + 1.96*σ/√n
Or 2500 - 1.96*1000/√10 ≤ μ ≤ 2500 + 1.96*1000/√10
1880 ≤ μ ≤ 3120
So, we can not say that the true mean is different from 3000
Approaches to two-sided hypothesis testing
Using Critical Value - CV
❖ Calculate a critical value zc(CV) for the specified α
❖ Compute the test statistics zobs(TS)
❖ Reject the null hypothesis if |TS| > |CV| and fail to reject the null if |TS| < |CV|
Example:
Let X equal the weight of a randomly selected 10 infants with the sample mean is 2500 grams. The population
follow normal distribution with standard deviation is 1000 grams
Question: Is the mean birth weight in this population different from 3000 grams?
Answer: With significance level α = 0.05, we have
- zc = 1.96 (recall that 2 × P(Z > |zc|) = 0.05)
- zobs = -1.58 (recall that zobs = (x̄ - μ)/(σ/√n) = -1.58)
Because, |zc| > |zobs| we can not say that the true mean is different from 3000
p-value
p-value
The p-value for a hypothesis test is the probability of obtaining a value of the test
statistics that is as or more extreme than the observed test statistics when the null
hypothesis is true
Question: Is the mean chair height in this production line different from 40cm?
One-sample Hypothesis Test for a Single Mean
❖ Set up a two-sided test of
- H0: mean = 40cm
- Ha: mean ≠ 40cm
❖ Let type I error be 0.05
- Calculate the test statistics
37.5 40
5
- What does this mean? Our observed mean is 1.58 standard error below the
hypothesized mean
- The test statistics is the standardized value of our data assuming the null
hypothesis is true
- Question: if the true mean is 40cm, is our observed sample mean of 37.5cm
“common” or is this value unlikely to occur?
One-sample Hypothesis Test for a Single Mean
- Calculate the p-value to answer our question
- If the true mean is 40cm, our data, or data more extreme than ours, would
occur in 11 out of 100 studies (of the same size, n=10)
- General guideline, if p-value is less than or equal to type I error, then reject
the null hypothesis
- Conclusion: we fail to reject the null hypothesis with 95% of confidence since
we choose α = 0.05
Summary of One-sample Hypothesis Testing
Summary of One-sample Hypothesis Testing for One Mean