Con Dence: ECON 226 - J L. G
Con Dence: ECON 226 - J L. G
library(haven)
filter(!is.na(mrkinc)) %>%
filter(!is.na(wages))
Warning message:
Warning message:
probability of . β
Why? Well:
FY (b) = P (Y ≤ b) and FY (a) = P (Y ≤ a) .
Therefore FY (b) − FY (a) = P (a ≤ Y ≤ b) .
In [4]:
# Example: an arbitrary distribution (Gamma(2,2))
#You don't need to know about this distribution; it's just an example
In [5]:
#Our goal is to find two points (a, b) such that the shaded region is 95%
a = 2
b = 12
g <- f + stat_function(fun = pdf_g, xlim = c(a, b), geom = "area", fill = "blue", alpha = 0.1)
In [6]:
# This is CDF(b) - CDF(a)
cdf_g(b) - cdf_g(a)
#adjust it so it is!
0.71840761710622
Many Possible Intervals
So, FY (a) = 1 − β ⟺ a = F
Y
−1
(1 − β)
So, FY (b) = β ⟺ b = F
−1
Y
(β)
g <- f + stat_function(fun = pdf_g, xlim = c(a, b), geom = "area", fill = "blue", alpha = 0.1)
0.710723021397324
20
Example 2: Standard Normal
P (Y ≤ b) = Φ(b) = 0.975
So, we can just find this by inverting the normal CDF, which
requires a calculator. Good news! R has one for use.
In [8]:
## Example 1
c(a,b)
1. -1.95996398454005
2. 1.95996398454005
In [9]:
# plotting the picture
f <- f + stat_function(fun = pdf_n, xlim = c(a, b), geom = "area", fill = "blue", alpha = 0.1)
Standard Normal CI
You will notice that this also gives just a convenient formula
we can derive. If has a standard normal distribution, then a
Z
α α
−1 −1
[Φ ( ), −Φ ( )]
2 2
σX
is a standard normal
This also implies that if Z is standard normal, then
X ≡ (σX ⋅ Z + μX ) is normal with mean μX and standard
deviation σX
α
−1
Zb = −Φ ( )
2
a -% CI!
β
α
−1
X a = σ X ⋅ Z a + μx = μX + Φ ( )σX
2
α
−1
X b = σ X ⋅ Z b + μx = μX − Φ ( )σX
2
∗
⟹ Xa,b = μx ± Z σX
α/2
Conclusions
construct a CI for x̄
∗
⟹ X̄ a,b = μ ¯ ± Z σ ¯
X α/2 X
σX
∗
⟹ X̄ a,b = μX ± Z
α/2
√n
Interpretation
X̄ a and X̄ b
mean = 15
std = 10
n = 10
In [11]:
a <- mean + qnorm(0.025)*std/sqrt(n)
c(a,b)
g <- f + stat_function(fun = pdf_n2, xlim = c(a, b), geom = "area", fill = "blue", alpha = 0.1)
cdf_n2(b) - cdf_n2(a)
1. 8.80204967695439
2. 21.1979503230456
0.95
Step 2a: Relaxing Assumptions about σx
sX
of freedom.
You can think of it as a "correction" of the normal
distribution when n is small:
It's still symmetrical about the mean
It has a mean of zero
As n → ∞ it becomes normal
This last property is why we didn't notice for a long time!
In fact for n > 120 there basically no (computable)
difference
In [40]:
## Comparisons
Bottom Line
Where
t
∗
n−1
(a) is the critical value associated with (i.e. the inverse a
of the CDF) t
In [8]:
# Example
mean = 0
std = 4
n = 1000
v = s/sqrt(n)
In [9]:
a <- mean + qnorm(0.025)*s/sqrt(n)
c(a,b)
cdf_t(b) - cdf_t(a)
g <- f + stat_function(fun = pdf_t, xlim = c(a, b), geom = "area", fill = "blue", alpha = 0.1)
1. -0.235522112275733
2. 0.235522112275733
0.949722321182881
In [10]:
a2 <- mean + qnorm(0.025)*s/sqrt(n)
c(a,b)
cdf_t(b) - cdf_t(a)
# normal values
c(a2,b2)
cdf_t(b2) - cdf_t(a2)
1. -0.23580780543825
2. 0.23580780543825
0.95
1. -0.235522112275733
2. 0.235522112275733
0.949722321182881
Conclusions
here
You will also notice that the -based intervals are larger
t
construct a CI for :
x̄
α sX
X̄ a,b = μX ± tn−1 ( )
2 √n
α sx
X̄ a,b = x̄ ± tn−1 ( )
2 √n
2
)
sx
√n
the "margin of error" (ME)
If the actual is within one ME of
x̄ μx , then it lies inside the
CI about μx
about x̄
mu <- 50000
se <- 20000/sqrt(n)
wbars <- mu + rnorm(k)*se #draw 10,000 sample means (rnorm generates a standard normal variable)
contains <- (mu > wbars + me)&(mu < wbars - me) #check if mu is within 1 ME of wbar
0.9527
Worked Example: CI for Average Wages
n <- nrow(census_data)
w_bar
s <- sd(census_data$wages)
se <- s/sqrt(n)
343063
54482.5211637513
64275.2748093883
109.737990091199
In [9]:
# Step 2a: Find critical values using normal approximation directly and build CI
CI <- c(a,b)
CI
1. 54199.8548331618
2. 54765.1874943407
In [10]:
# Step 2b: Find critical values by normalizing and using formula
CI <- c(a,b)
CI
1. 54199.8532604581
2. 54765.1890670444