0% found this document useful (0 votes)
18 views30 pages

Confidence Interval and Hypothesis Testing: Iv An Andr Es Trujilllo Abella

The document discusses confidence intervals (CIs) and hypothesis testing, explaining their purpose, construction, and interpretation. It covers the significance level, the use of sample and population parameters, and the implications of unknown standard deviations. Additionally, it addresses paired and unpaired data, as well as proportion confidence intervals and their applications in statistical analysis.

Uploaded by

Omar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views30 pages

Confidence Interval and Hypothesis Testing: Iv An Andr Es Trujilllo Abella

The document discusses confidence intervals (CIs) and hypothesis testing, explaining their purpose, construction, and interpretation. It covers the significance level, the use of sample and population parameters, and the implications of unknown standard deviations. Additionally, it addresses paired and unpaired data, as well as proportion confidence intervals and their applications in statistical analysis.

Uploaded by

Omar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Confidence interval and hypothesis testing

Iván Andrés Trujilllo Abella

[email protected]

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 1 / 30


Confidence interval

Aim
Get a range of admissible values for our parameter.... Θ

read it
With 99% confidence Θ will be inside our estimated confidence interval...

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 2 / 30


Confidence?

Consider that CI is random (Rely on in each sample) unlike Θ is fixed.


We use Θ̂ for construct the CI but it belong to Θ
Not is 95% of probability of Θ is in specific interval.
Confidence means; if repeated the method (collect data and construct
CI) for α = 0.05 of 100 CI’s, you expect of 95 of them capture
parameter Θ.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 3 / 30


Quantile

Remember the definition of zα is


...
P(X < zα ) = α (1)

Confidence level
1 − α, α ∈ (0, 1) (2)
Significance level of α.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 4 / 30


Upper and lower bounds

Given α we are searching two values (under and above) of zero (remember
that is Z ) that:
...

Z1− α2
The area between −Z1− α2 and Z1− α2 is equal to α

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 5 / 30


CI

...
!
(x̄ − µ)
P −Z1− α2 ≤ ≤ Z1− α2 =1−α (3)
√σ
n

The before intervals were constructe


...
 
σ σ
P x̄ − Z1− 2 √ ≤ µ ≤ x̄ + Z1− 2 √
α α =1−α (4)
n n

Note that σ is population parameter, if n is large you could uses sample


standard deviation S.
pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 6 / 30


CI

General
Θ̂ ± Margin of error (5)
Where Θ̂ is our best estimator.
There is a importaint point here, is that condifence interval rely on in the
distrbution of Θ̂.
...
Θ̂ ± Z1− α2 S.E (Θ̂) (6)

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 7 / 30


Precision - informative

...
If a interval is very wide then not is informative!

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 8 / 30


unknown σ

In this case we dont known the SE therefore is used a Estimated


Estandard Error (ESE)

ESE for mean


S
√ (7)
n
where S is the sample standard deviation.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 9 / 30


CI with σ unknown

...
 
S S
x̄ − t(1− α2 ,n−1) √ , x̄ + t(1− α2 ,n−1) √ (8)
n n
where tn−1 comes from t distribution with n degrees of freedom.

Consider that when n is large t tend to Z .


from scipy.stats import t
from scipy.stats import norm
print(norm.ppf(0.95))
print(t.ppf(0.95, 25))
print(t.ppf(0.95, 100000))
pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 10 / 30


mean difference pair data

Paired data
Two measurement of a same individual after a treatment.

µd = µpost − µpre (9)

...
 
Sd S
x̄d − t(1− α2 ,n−1) √ , x̄d + t (1− α ,n−1) √d (10)
n 2 n
where tn−1 comes from t distribution with n degrees of freedom, a where y
is our variable of interest in dataset and therefore
n
1X
x̄d = ypost,i − ypre,i (11)
n pujshield-eps-converte

i=1

Iván Andrés Trujillo ai-page.readthedocs.io 11 / 30


No paired data

Two approaches
Pooled σA2 = σB2
Unpooled σA2 ̸= σB2
Uses SA and SB as approximations to see what approach is better.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 12 / 30


No paired data
Unpooled

SE Unpooled
s
σA2 σ2
SE = + B (12)
nA nB
Remember that in most practical applications we don’t know σ then
replace it with S.
...
used t distribution to estimate the area:
Uses Welchs approximation (See this reference)
or min(nA − 1, nB − 1) pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 13 / 30


No paired data
pooled

ESE pooled
q
(nA − 1)SA2 + (nB − 1)SB2
r
1 1
+ (13)
nA + nB − 2 nA nB

Excersice
Construct a program to calcualte pooled and unpooled intervals.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 14 / 30


laboratories CI
mean

...
CI(mean) simulation
CI(mean) real data

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 15 / 30


Laboratories

First
Central limit theorem

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 16 / 30


CI
Θ

...
given that limits are random the interval is random
...
Seeing-theory

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 17 / 30


Confidence interval

For what it is useful confidence interval?


Now assume that α ∈ [0, 1]

P(θ̂low < θ < θ̂upper ) = 1 − α. (14)


Note that the interval is also random.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 18 / 30


CI

Remember that CI is aming to find θ therefore is a mistake said that


(1 − α) ∗ 100 times the parameter falls inside the interval (There is a
common mistake). How find θ̂low , θ̂upper
A better approximation is that the probability of the interval contain the
parameter is 1 − α.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 19 / 30


Interpretation of CI

it is uses the interval in n sampling evaluations then (1 − α) times the


interval contain θ.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 20 / 30


CI(mean) with known σ 2

Xn ∼ N(µ, σ 2 ), then
x̄ − µ
∼ N(0, 1) (15)
√σ
n

x̄ − µ
P(−1.96 ≤ ≤ 1.96) = 0.95 (16)
√σ
n

before of some algebraic inequalities operations we have:


σ σ
x̄ − 1.96 √ ≤ µ ≤ x̄ + 1.96 √ (17)
n n
See simulation here: CI simulation (click)
Which is the pivotal quantity? pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 21 / 30


The value Z α2 is whose that the area to the right of the point (in normal
curve) is α2 is also the quantile of level 1 − α2 the value that left to the left
of the area 1 − α2 .

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 22 / 30


Proportion confidence interval P

P
xi
P̂ = (18)
n
where xi is the number of successes, therefore using the central limit
theorem we have that
P̂ − P
q ∼ N(0, 1) (19)
P̂(1−P̂)
n

Interval
 s s 
P̂ − Z1− α P̂(1 − P̂) P̂(1 − P̂) 
, P̂ + Z1− α2 (20)
2 n n
pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 23 / 30


pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 24 / 30


Work

...
Analize how chante the results for α = 0.01

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 25 / 30


Considerations

It is important remember Random sample.


take in mind when you use zf (α) that sample n ≥ 30.

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 26 / 30


laboratories CI
Proportion

...
CI(proportion) simulation
CI(proportion) real data

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 27 / 30


Differece proportion

In two populations (A, B) that present a feature as ϕ determine if


...
PA (ϕ) − PB (ϕ) (21)
The difference in the proportion of subjects or objects in A and B that
present ϕ
...
P̂A − P̂B ± Z1− α2 SE (P̂A − P̂B ) (22)

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 28 / 30


Where comes from the SE?

...
s
P̂A (1 − P̂A ) P̂B (1 − P̂B )
SE (P̂A − P̂B ) = + (23)
nA nB

How interpret them?


if the interval is positive (L, U) we are going to said; there are 95% of
confidence that PA is greater than PB between L and U.

Exercise
With the following Dataset determine the confidence interval for proportion
difference among Males and Females whose score in any module pujshield-eps-converte
is in Q3 .

Iván Andrés Trujillo ai-page.readthedocs.io 29 / 30


Considerations

What happend if 0 is in the interval of differneces?, remeberm that here is


neccesary random samples and that the samples is large this las
requirement is

ni P̂i ≥ 10 for i = A, B (24)

ni (1 − P̂i ) ≥ 10 for i = A, B (25)

pujshield-eps-converte

Iván Andrés Trujillo ai-page.readthedocs.io 30 / 30

You might also like